Annotate alerts with user-defined documentation

This page describes how you can configure your alerting policy documentation to customize the body and subject line of your notifications. The documentation fields support plain text, Markdown, variables, and channel-specific controls.

Add information to the notification

You can provide your alert responders with remediation steps and information about the incident in the notification, by specifying that content when you create the alerting policy. For example, you might configure the alerting policy to include links to an internal playbook in any notifications.

For a sample implementation, see the Example section of this page.

Configure the subject line of notifications

You can manage and sort your notifications by specifying the subject line of those notifications. Subject lines are limited to 255 characters. If you don't define a subject in your documentation, then Cloud Monitoring determines the subject line.

You can configure the subject line when you use the Cloud Monitoring API, the Google Cloud CLI, or the Google Cloud console.

For a sample implementation, see the Example section of this page.

Using Markdown

The documentation field supports the following subset of Markdown tagging:

  • Headers, indicated by initial hash characters.
  • Unordered lists, indicated by initial plus, minus, or asterisk characters.
  • Ordered lists, indicated by an initial number followed by a period.
  • Italic text, indicated by single underscores or asterisks around a phrase.
  • Bold text, indicated by double underscores or asterisks around a phrase.
  • Links, indicated by [link text](url) syntax.

For more information about this tagging, see any Markdown reference, for example, Markdown guide.

Using variables

To customize the text in your documentation, you can use variables of the form ${varname}. When the documentation is sent with a notification, the string ${varname} is replaced with a value drawn from the corresponding Google Cloud resource, as described in the following table.

Variable Value
condition.name The REST resource name of the condition, such as
projects/foo/alertPolicies/1234/conditions/5678.
condition.display_name The display name of a condition, such as CPU usage increasing rapidly.
log.extracted_label.KEY The value of the label KEY, extracted from a log entry. For log-based alerts only; for more information, see Create a log-based alert (Monitoring API).
metadata.system_label.KEY The value of the system-supplied resource metadata label KEY.1
metadata.user_label.KEY The value of the user-defined resource metadata label KEY.1,3
metric.type The metric type, such as
compute.googleapis.com/instance/cpu/utilization.
metric.display_name The display name for the metric type, such as CPU utilization.
metric.label.KEY

The value of the metric label KEY.1
To find the labels associated with the metric type, see Metric list.

When the value of the variable ${metric.label.KEY} doesn't start with a digit, a letter, a forward slash (/), or an equal sign (=), Monitoring omits the label from notifications.

When you migrate a Prometheus alerting rule, the Prometheus alert field templates {{$value}} and {{humanize $value}} appear as ${metric.label.VALUE}, in the alerting policy documentation configuration. In this case, VALUE holds the value of the PromQL query.

You can also use ${metric.label.VALUE} when you create PromQL queries in Google Cloud.

metric_or_resource.labels

This variable renders all metric and resource label values as a sorted list of key-value pairs. If a metric label and a resource label have the same name, then only the metric label is rendered.

When you migrate a Prometheus alerting rule, the Prometheus alert field templates {{$labels}} and {{humanize $labels}} appear as ${metric_or_resource.labels} in the alerting policy documentation configuration.

metric_or_resource.label.KEY
  • If KEY is a valid label, then this variable renders in the notification as the value of ${metric.label.KEY}.
  • If KEY is a valid resource, then this variable renders in the notification as the value of ${resource.label.KEY}.
  • If KEY is neither a valid label nor a valid resource, then this variable renders in the notification as an empty string.

When you migrate a Prometheus alerting rule, the Prometheus alert field templates {{$labels.KEY}} and {{humanize $labels.KEY}} appear as ${metric_or_resource.labels.KEY} in the alerting policy documentation configuration.

policy.name The REST resource name of the policy, such as projects/foo/alertPolicies/1234.
policy.display_name The display name of a policy, such as High CPU rate of change.
policy.user_label.KEY The value of the user label KEY.1

Keys must start with a lowercase letter. Keys and values can contain only lowercase letters, digits, underscores, and dashes.

project The ID of the scoping project of a metrics scope, such as a-gcp-project.
resource.type The monitored-resource type, such as gce_instance.
resource.project The project ID of the monitored resource of the alerting policy.
resource.label.KEY The value of the resource label KEY.1,2,3
To find the labels associated with the monitored-resource type, see Resource list.

1 For example, ${resource.label.zone} is replaced with the value of the zone label. The values of these variables are subject to grouping; see null values for more information.
2 To retrieve the value of the project_id label on a monitored resource in the alerting policy, use ${resource.project}.
3 You can't access user-defined resource metadata labels by using resource.label.KEY. Use metadata.user_label.KEY instead.

Usage notes

  • Only the variables in the table are supported. You cannot combine them into more complex expressions, like ${varname1 + varname2}.
  • To include the literal string ${ in your documentation, escape the $ symbol with a second $ symbol, and $${ renders as ${ in your documentation.
  • These variables are replaced by their values only in notifications sent through notification channels. In the Google Cloud console, when the documentation is shown, you see the variables, not the values. Examples in the console include the descriptions of incidents and the preview of the documentation when creating an alerting policy.
  • Ensure that the aggregation settings of the condition don't eliminate the label. If the label is eliminated, then the value of the label in the notification is null. For more information, see Variable for a metric label is null.

Example

The following example shows Google Cloud console and Cloud Monitoring API versions of template documentation for a CPU utilization alerting policy, and the rendered documentation that appears in the body of a notification. This example uses an email for the notification channel type. The documentation template includes several variables to summarize the incident and to reference the alerting policy and condition REST resources.

Google Cloud console

## CPU utilization exceeded

### Summary

The ${metric.display_name} of the ${resource.type}
${resource.label.instance_id} in the project ${resource.project} has
exceeded 90% for over 15 minutes.

### Additional resource information

Condition resource name: ${condition.name}  
Alerting policy resource name: ${policy.name}  

### Troubleshooting and Debug References

Repository with debug scripts: example.com  
Internal troubleshooting guide: example.com  
${resource.type} dashboard: example.com

Cloud Monitoring API

"documentation": {
"content": "## CPU utilization exceeded\n\n### Summary\n\nThe ${metric.display_name} of the ${resource.type} ${resource.label.instance_id} in the project ${resource.project} has exceeded 90% for over 15 minutes.\n\n### Additional resource information\n\nCondition resource name: ${condition.name}  \nAlerting policy resource name: ${policy.name}  \n\n### Troubleshooting and Debug References\n    \nRepository with debug scripts: example.com  \nInternal troubleshooting guide: example.com  \n${resource.type} dashboard: example.com",
"mimeType": "text/markdown",
"subject": "Alert: ${metric.display_name} exceeded"
}

Format in notification

Example of how documentation renders in a notification.

null values

Values for the metric.*, resource.* and metadata.* variables are derived from time series. Their values can be null if no values are returned from the time series query.

  • The resource.label.KEY and metric.label.KEY variables can have null values if your alerting policy uses cross-series aggregation (reduction), for example, calculating the SUM across each of the time-series that match a filter. When using cross-series aggregation, any labels not used in grouping are dropped and as a result they render as null when the variable is replaced with its value. All labels are retained when there is no cross-series aggregation. For more information, see Variable for a metric label is null.

  • Values for metadata.* variables are available only if the labels are explicitly included in a condition's filter or grouping for cross-series aggregation. That is, you must refer to the metadata label in either filter or grouping for it to have a value for the template.

Variable resolution

Variables in documentation templates are resolved only in the notifications sent by using the following notification channels:

  • Email
  • Slack
  • Pub/Sub, JSON schema version 1.2
  • Webhooks, JSON schema version 1.2
  • PagerDuty, JSON schema version 1.2

Variables are not resolved, but appear as strings like ${varname}, in other contexts, including the following:

  • On the Incident details page in the Google Cloud console.
  • In notifications sent by using other notification channels.

Using channel controls

The text in the documentation field can also include special characters used by the notification channel itself to control formatting and notifications.

For example, Slack uses @ for mentions. You can use this to link the notification to a specific user ID. Suppose you include a string like this in the documentation field:

<@backendoncall> policy ${policy.display_name} triggered an incident

When the documentation field is received by the relevant Slack channel as part of the notification, this line triggers an additional message to the user ID backendoncall that, for example, the policy High CPU rate of change triggered an incident. The mention must refer to a user ID, not a name.

These additional options are specific to the channels; for more information on what may be available, consult the documentation provided by the channel vendor.