Create alert rules

This page outlines the process for creating alert rules based on logs and metrics from Google Distributed Cloud (GDC) air-gapped environments to facilitate proactive monitoring and faster incident response.

GDC lets you define alert rules based on your project's metrics or logs. These rules automatically trigger alerts when specific conditions are met. The following are the alert rule types you can define:

Metric-based rules: Trigger alerts based on numerical data collected from your applications or infrastructure. For example, you could create a rule to trigger an alert if CPU usage exceeds 80%. Create metric-based rules using the GDC console or a MonitoringRule custom resource definition.
Log-based rules: Trigger alerts based on the analysis of log data. These alerts let you identify and respond to specific events or patterns within your logs, such as error messages or unusual activity. Create log-based rules using the GDC console or a LoggingRule custom resource definition.

Both metric-based and log-based rules rely on a query language expression to define the condition that triggers an alert. This expression filters and analyzes the incoming data, evaluating whether the defined criteria are met.

The first time a condition is met, the alert transitions to the pending state. If the conditions are true over the following duration period you define, the alert moves to the open state. At that moment, the system sends the alert.

To provide further context and facilitate efficient alert management, you can add labels and annotations to your alert rules:

Labels: Key-value pairs that categorize and identify alerts. Use labels for information like the following:
- Severity level (error, critical, warning)
- Alert code
- Resource name
Annotations: Provide additional non-identifying information to enrich the alert. Annotations can include the following information:
- Detailed error messages
- Relevant expressions
- Links to runbooks or troubleshooting guides

Before you begin

To get the permissions that you need to manage metric-based rules, ask your Organization IAM Admin or Project IAM Admin to grant you one of the associated MonitoringRule resource roles.

On the other hand, to get the permissions that you need to manage log-based rules, ask your Organization IAM Admin or Project IAM Admin to grant you one of the associated LoggingRule resource roles.

Depending on the level of access and permissions you need, you might obtain creator, editor, or viewer roles for these resources in an organization or a project. For more information, see Prepare IAM permissions.

Define alert rules

You can define alert rules in your project namespace using either the GDC console (preferred) or the monitoring and logging APIs to apply custom resources.

Select one of the following methods to define alert rules based on metrics or logs:

Console

Create alert rules in rule groups from the GDC console:

In the GDC console, select a project.
In the navigation menu, select Operations > Alerting.
Click the Alerting Policy tab.
Click Create Rule Group.
Choose the alert rule type:
- Select Metrics for alert rules based on metrics.
- Select Logs for alert rules based on logs.
Configure the alert rule group:
- In the Alert rule group name field, enter a name for the rule group.
- In the Rule evaluation interval field, enter the number of seconds for each interval.
- In the Limit field, enter the maximum number of alerts.
  
  Note: Enter 0 for unlimited alerts.
In the Alert rules section, click Add Rule.
In the Create alert rule window, enter the following details:
- A name for the alert rule.
- An expression for the alert rule (Use LogQL for log-based rules and PromQL for metric-based rules).
  
  Note: This expression must evaluate to a true or false statement, which determines whether the alert must move to a pending state or not.
- The duration in seconds before an alert transitions from pending to open.
  
  Note: If you set the duration to a value of 0, the system sends the alert immediately after the condition is met.
- The severity level, such as Error or Warning.
- A short name to identify the related resource.
- An alert code to identify the alert.
- A runbook URL or troubleshooting information.
- An alert message or description.
- Optional: Add Labels and Annotations as key-value pairs.
Click Save to create the rule.
Click Create to create the rule group.

The rule group appears in the Alert rule group list. You can group more alert rules within this rule group.

API

Create alert rules from the monitoring or logging APIs:

Define a MonitoringRule (metric-based rules) or LoggingRule (log-based rules) custom resource in a YAML file.

The complete resource specification shows an example for metric-based and log-based rules.

Replace the following values in the YAML file according to your needs:

Field	Description
`namespace`	The project namespace.
`name`	The name for the alert rule configuration.
`source`	The log source for the alert rule. Valid options are `operational` and `audit`. Only applicable for `LoggingRule` resources.
`interval`	The duration of the rule evaluation interval in seconds.
`limit`	Optional. The maximum number of alerts. Set to `0` for unlimited alerts.
`alertRules`	The definitions for creating alert rules.
`alertRules.alert`	The name of the alert.
`alertRules.expr`	A LogQL expression for log-based rules or a PromQL expression for metric-based rules. The expression must evaluate to a true or false value to determine if the alert transitions to a pending state.
`alertRules.for`	Optional. The duration in seconds before an alert transitions from pending to open. Defaults to `0` seconds (immediate triggering).
`alertRules.labels`	Key-value pairs to categorize and identify the alert. It requires the following labels: `severity`, `code`, and `resource`.
`alertRules.annotations`	Optional. Non-identifying metadata to the alert as key-value pairs.

Save the YAML file.
Apply the resource configuration to the Management API server within the same namespace as your metric-based or log-based alert rules:
```
kubectl --kubeconfig KUBECONFIG_PATH apply -f ALERT_RULE_NAME.yaml
```
Replace the following:
- KUBECONFIG_PATH: the path to the kubeconfig file for the Management API server.
- ALERT_RULE_NAME: the name of the MonitoringRule or LoggingRule definition file.

Complete resource specification

This section contains the YAML templates you can use to create metric-based and log-based alert rules by applying custom resources. If you create alerts from the GDC console, you can skip this section.

Define alert rules in the following custom resources:

MonitoringRule: metric-based rules.
LoggingRule: log-based rules.

MonitoringRule

The following YAML file shows a template for the MonitoringRule custom resource. For more information, see the API reference documentation.

# Configures either an alert or a target record for precomputation.
apiVersion: monitoring.gdc.goog/v1
kind: MonitoringRule
metadata:
  # Choose a namespace that matches the project namespace.
  # The alert or record is produced in the same namespace.
  namespace: PROJECT_NAMESPACE
  name: MONITORING_RULE_NAME
spec:
  # Rule evaluation interval.
  interval: 60s

  # Configure the limit for the number of alerts.
  # A value of '0' means no limit.
  # Optional.
  # Default value: '0'
  limit: 0

  # Configure metric-based alert rules.
  alertRules:
    # Define an alert name.
  - alert: my-metric-based-alert

    # Define the PromQL expression to evaluate for this rule.
    expr: rate({service_name="bob-service"} [1m])

    # The duration in seconds before an alert transitions from pending to open.
    # Optional.
    # Default value: '0s'
    for: 0s

    # Define labels to add or overwrite.
    # Map of key-value pairs.
    # Required labels:
    #     severity: [error, critical, warning, info]
    #     code:
    #     resource: component/service/hardware related to the alert
    # Additional labels are optional.
    labels:
      severity: error
      code: 202
      resource: AIS
      another-label: another-value

    # Define annotations to add.
    # Map of key-value pairs.
    # Optional.
    # Recommended annotations:
    #     message: value of the Message field in the user interface.
    #     expression: value of the Rule field in the user interface.
    #     runbookurl: URL of the Actions to take field in the user interface.
    annotations:
      message: my-alert-message

Replace the following:

PROJECT_NAMESPACE: your project namespace.
MONITORING_RULE_NAME: the name of the MonitoringRule definition file.

LoggingRule

The following YAML file shows a template for the LoggingRule custom resource. For more information, see the API reference documentation.

# Configures either an alert or a target record for precomputation.
apiVersion: logging.gdc.goog/v1
kind: LoggingRule
metadata:
  # Choose a namespace that matches the project namespace.
  # The alert or record is produced in the same namespace.
  namespace: PROJECT_NAMESPACE
  name: LOGGING_RULE_NAME
spec:
  # Choose the log source to base alerts on (operational or audit logs).
  # Optional.
  # Valid options: 'operational' and 'audit'
  # Default value: 'operational'
  source: operational

  # Rule evaluation interval.
  interval: 60s

  # Configure the limit for the number of alerts.
  # A value of '0' means no limit.
  # Optional.
  # Default value: '0'
  limit: 0

  # Configure log-based alert rules.
  alertRules:
    # Define an alert name.
  - alert: my-log-based-alert

    # Define the LogQL expression to evaluate for this rule.
    expr: rate({service_name="bob-service"} [1m])

    # The duration in seconds before an alert transitions from pending to open.
    # Optional.
    # Default value: '0s'
    for: 0s

    # Define labels to add or overwrite.
    # Map of key-value pairs.
    # Required labels:
    #     severity: [error, critical, warning, info]
    #     code:
    #     resource: component/service/hardware related to the alert
    # Additional labels are optional.
    labels:
      severity: warning
      code: 202
      resource: AIS
      another-label: another-value

    # Define annotations to add.
    # Map of key-value pairs.
    # Optional.
    # Recommended annotations:
    #     message: value of the Message field in the user interface.
    #     expression: value of the Rule field in the user interface.
    #     runbookurl: URL of the Actions to take field in the user interface.
    annotations:
      message: my-alert-message