The conditions for an alerting policy define what is monitored and when to trigger an alert.
For example, suppose you want to define an alerting policy that emails you if the CPU utilization of a Compute Engine VM instance is above 80% for more than 3 minutes. You use the conditions dialog to specify that you want to monitor the CPU utilization of a Compute Engine VM instance, and that you want an alert to trigger when that utilization is above 80% for 3 minutes.
Before you begin
To open the Conditions pane, do the following:
Go to Stackdriver > Monitoring > Alerting > Create a Policy:
Click Add Condition.
The Title field is a required field. As you complete the fields in the conditions dialog, the title field is automatically populated. You can change the auto-populated content to something more meaningful to you.
Type of Condition
The conditions dialog lets you select the type of condition that you are adding. While all conditions include a configuration that defines when an alert occurs, each type of condition has unique fields:
- A metric condition is defined by a resource type and a metric.
- An uptime-check condition is defined by a resource type and an uptime check.
- A process-health condition is defined by a resource type and a series of filters.
In the tab header, use the arrows to scroll and then click the type of condition you wish to add:
After you select the type of condition, you use the fields in the Target pane to define values for the condition's fields. For example, if you select a metric condition, the target pane includes list boxes for the resource type and metric.
When you select a target for any type of alerting policy, you are selecting a set of time series that must stay within some constraint. These time series are plotted on the chart for the condition. For more information on time series, see Metrics, time series, and resources.
Adding a metric target
A metric target is defined by a resource type and a metric. For example, you might select Compute Engine VM Instance and CPU load (15m) as the resource type and metric, respectively. To add a metric condition, do the following:
Click the Metric tab.
Click the Find resource type and metric field to bring up a drop-down list of available resource types and metrics, and then select the resource type that you want to monitor:
After you select the resource type, the list displays only metrics for that resource type. Only metrics where is data available are listed. Scroll through the Metrics options and select the specific metric that you want your policy to monitor:
After you select the resource type and metric, this page expands to display a chart and to provide fine-grained control for your alerting condition. See Configuring a target metric for details on the new options. For additional information:
- See Using custom metrics for details on how to create your own custom metrics.
- See Overview of logs-based metrics for details on how to create metrics based on the content of log entries.
- See Sample policies for alerting policy samples and for representing alerting policies in JSON format.
You can't create a condition based on the ratio of two metrics through the UI, but you can create such policies using the API. See Metric ratio for a sample policy.
Adding an uptime-check target
We recommend creating an alerting policy for an uptime check from the Monitoring > Uptime checks page. In this case, the condition fields in the alerting policy are populated for you. See Alerting on uptime checks for details.
Adding a process-health target
A process-health target is defined by a resource type and a series of filters. You can configure this policy to trigger an incident if the number of processes that match a specific pattern falls above, or below, a threshold during a duration window. To add a process-health condition, do the following:
- Click the Process health tab.
In the Resource Type fields, complete the following steps:
- In the left drop-down list, select a single resource, a group of resources, or all resources.
- In the right drop-down list, select the resource type you want to monitor. For example, you might select Compute Engine VM Instance. The UI provides the list of available resource types for your system.
For the Command Line, Command, and User filters, select the fields to identify the processes that you want to monitor. In these filters, the left drop-down list selects the string-match operator and the right field specifies the query.
- The string-match operators are:
Ends with, and
Regex. The operations are case sensitive.
- The syntax of the query depends on the operation choice.
You can use wildcard operators in queries. For example, the wildcard
*matches any process.
The results of the three filters are combined using the following rules:
If you don't specify the query value for any of the filters, then all processes are counted.
If you enter a query for one filter, only processes that match the filter are counted.
If you enter command-line and command queries, processes that match either filter are counted. Note that command lines are truncated after 1024 characters, so text in a command line beyond that limit can't be matched against.
If you enter a user query, processes that match the user filter and the command-line-or-command filter are counted.
- The string-match operators are:
As an example, to count the number of processes with
nginx in their name,
that are owned by
root, on all Compute Engine VM instances in a project,
you can configure the Target region as follows:
- In the Resource type left drop-down list, select All, and for the right drop-down list, select Compute Engine VM Instance.
- In the Command Line left drop-down list, select Contains,
and for the right field, enter
- Leave the Command right field empty.
- In the User left drop-down list, select Equals, and for the
right field, enter
In the preceding figure, the graph shows an alerting threshold of one process and data for two instances. One instance has no processes that meet the filter conditions, and the other instance has two processes that meet the filter conditions.
After specifying the target, you have to indicate what constitutes a violation of the constraints on the target.
You use the Configuration region to define when the alerting policy triggers. The configuration region defines which time series can cause an alert to trigger and when these time series aren't meeting the policy.
For example, to configure an alerting policy to trigger if any time series is above 50 for 3 minutes, do the following:
- In the Condition triggers if drop-down list, select Any time series violates.
- In the Condition drop-down list, select is above.
- In the Threshold field, enter
In the For drop-down list, select 3 minutes.
In addition to the configuration options described in the preceding example, you can specify different subsets of the time series that can trigger the alert, and different criteria for violation.
The Condition triggers if drop-down list let you select the subset of the targets that must violate the condition: all time series or a subset of time series. The list of options includes the following:
- Any time series violates
- Percent of time series violates
- Number of time series violates
- All time series violate
The Condition drop-down list includes the following choices:
- Is above
- Is below
- Increases by
- Decreases by
- Is absent
In the preceding example, the constraint is violated if a single time series
is in violation. For the criteria for a violation, the Condition fields
are set to is above and
50, and the duration is three minutes. So, this
alerting policy is triggered if any time series in the target set
goes above 50 and stays there for three minutes.
Finish defining the condition
To complete the definition of your condition and return to the alerting policy dialog, click Save.