This page outlines the process for creating alert rules based on logs and metrics from Google Distributed Cloud (GDC) air-gapped environments to facilitate proactive monitoring and faster incident response.
GDC lets you define alert rules based on your project's metrics or logs. These rules automatically trigger alerts when specific conditions are met. The following are the alert rule types you can define:
- Metric-based rules: Trigger alerts based on numerical data collected from
your applications or infrastructure. For example, you could create a rule to
trigger an alert if CPU usage exceeds 80%. Create metric-based rules using the
GDC console or a
MonitoringRule
custom resource definition. - Log-based rules: Trigger alerts based on the analysis of log data. These
alerts let you identify and respond to specific events or patterns within your
logs, such as error messages or unusual activity. Create log-based rules using
the GDC console or a
LoggingRule
custom resource definition.
Both metric-based and log-based rules rely on a query language expression to define the condition that triggers an alert. This expression filters and analyzes the incoming data, evaluating whether the defined criteria are met.
The first time a condition is met, the alert transitions to the pending state. If the conditions are true over the following duration period you define, the alert moves to the open state. At that moment, the system sends the alert.
To provide further context and facilitate efficient alert management, you can add labels and annotations to your alert rules:
Labels: Key-value pairs that categorize and identify alerts. Use labels for information like the following:
- Severity level (error, critical, warning)
- Alert code
- Resource name
Annotations: Provide additional non-identifying information to enrich the alert. Annotations can include the following information:
- Detailed error messages
- Relevant expressions
- Links to runbooks or troubleshooting guides
Before you begin
To get the permissions that you need to manage metric-based rules, ask your
Organization IAM Admin or Project IAM Admin to grant you one of the associated
MonitoringRule
resource roles.
On the other hand, to get the permissions that you need to manage log-based
rules, ask your Organization IAM Admin or Project IAM Admin to grant you one of
the associated LoggingRule
resource roles.
Depending on the level of access and permissions you need, you might obtain creator, editor, or viewer roles for these resources in an organization or a project. For more information, see Prepare IAM permissions.
Define alert rules
You can define alert rules in your project namespace using either the GDC console (preferred) or the monitoring and logging APIs to apply custom resources.
Select one of the following methods to define alert rules based on metrics or logs:
Console
Create alert rules in rule groups from the GDC console:
- In the GDC console, select a project.
- In the navigation menu, select Operations > Alerting.
- Click the Alerting Policy tab.
- Click Create Rule Group.
Choose the alert rule type:
- Select Metrics for alert rules based on metrics.
- Select Logs for alert rules based on logs.
Configure the alert rule group:
- In the Alert rule group name field, enter a name for the rule group.
- In the Rule evaluation interval field, enter the number of seconds for each interval.
In the Limit field, enter the maximum number of alerts.
In the Alert rules section, click
Add Rule.In the Create alert rule window, enter the following details:
- A name for the alert rule.
An expression for the alert rule (Use LogQL for log-based rules and PromQL for metric-based rules).
The duration in seconds before an alert transitions from pending to open.
The severity level, such as Error or Warning.
A short name to identify the related resource.
An alert code to identify the alert.
A runbook URL or troubleshooting information.
An alert message or description.
Optional: Add Labels and Annotations as key-value pairs.
Click Save to create the rule.
Click Create to create the rule group.
The rule group appears in the Alert rule group list. You can group more alert rules within this rule group.
API
Create alert rules from the monitoring or logging APIs:
Define a
MonitoringRule
(metric-based rules) orLoggingRule
(log-based rules) custom resource in a YAML file.The complete resource specification shows an example for metric-based and log-based rules.
Replace the following values in the YAML file according to your needs:
Field Description namespace
The project namespace. name
The name for the alert rule configuration. source
The log source for the alert rule. Valid options are operational
andaudit
. Only applicable forLoggingRule
resources.interval
The duration of the rule evaluation interval in seconds. limit
Optional. The maximum number of alerts. Set to 0
for unlimited alerts.alertRules
The definitions for creating alert rules. alertRules.alert
The name of the alert. alertRules.expr
A LogQL expression for log-based rules or a PromQL expression for metric-based rules. The expression must evaluate to a true or false value to determine if the alert transitions to a pending state. alertRules.for
Optional. The duration in seconds before an alert transitions from pending to open. Defaults to 0
seconds (immediate triggering).alertRules.labels
Key-value pairs to categorize and identify the alert. It requires the following labels: severity
,code
, andresource
.alertRules.annotations
Optional. Non-identifying metadata to the alert as key-value pairs. Save the YAML file.
Apply the resource configuration to the Management API server within the same namespace as your metric-based or log-based alert rules:
kubectl --kubeconfig KUBECONFIG_PATH apply -f ALERT_RULE_NAME.yaml
Replace the following:
KUBECONFIG_PATH
: the path to the kubeconfig file for the Management API server.ALERT_RULE_NAME
: the name of theMonitoringRule
orLoggingRule
definition file.
Complete resource specification
This section contains the YAML templates you can use to create metric-based and log-based alert rules by applying custom resources. If you create alerts from the GDC console, you can skip this section.
Define alert rules in the following custom resources:
MonitoringRule
: metric-based rules.LoggingRule
: log-based rules.
MonitoringRule
The following YAML file shows a template for the MonitoringRule
custom
resource. For more information, see the
API reference documentation.
# Configures either an alert or a target record for precomputation.
apiVersion: monitoring.gdc.goog/v1
kind: MonitoringRule
metadata:
# Choose a namespace that matches the project namespace.
# The alert or record is produced in the same namespace.
namespace: PROJECT_NAMESPACE
name: MONITORING_RULE_NAME
spec:
# Rule evaluation interval.
interval: 60s
# Configure the limit for the number of alerts.
# A value of '0' means no limit.
# Optional.
# Default value: '0'
limit: 0
# Configure metric-based alert rules.
alertRules:
# Define an alert name.
- alert: my-metric-based-alert
# Define the PromQL expression to evaluate for this rule.
expr: rate({service_name="bob-service"} [1m])
# The duration in seconds before an alert transitions from pending to open.
# Optional.
# Default value: '0s'
for: 0s
# Define labels to add or overwrite.
# Map of key-value pairs.
# Required labels:
# severity: [error, critical, warning, info]
# code:
# resource: component/service/hardware related to the alert
# Additional labels are optional.
labels:
severity: error
code: 202
resource: AIS
another-label: another-value
# Define annotations to add.
# Map of key-value pairs.
# Optional.
# Recommended annotations:
# message: value of the Message field in the user interface.
# expression: value of the Rule field in the user interface.
# runbookurl: URL of the Actions to take field in the user interface.
annotations:
message: my-alert-message
Replace the following:
PROJECT_NAMESPACE
: your project namespace.MONITORING_RULE_NAME
: the name of theMonitoringRule
definition file.
LoggingRule
The following YAML file shows a template for the LoggingRule
custom
resource. For more information, see the
API reference documentation.
# Configures either an alert or a target record for precomputation.
apiVersion: logging.gdc.goog/v1
kind: LoggingRule
metadata:
# Choose a namespace that matches the project namespace.
# The alert or record is produced in the same namespace.
namespace: PROJECT_NAMESPACE
name: LOGGING_RULE_NAME
spec:
# Choose the log source to base alerts on (operational or audit logs).
# Optional.
# Valid options: 'operational' and 'audit'
# Default value: 'operational'
source: operational
# Rule evaluation interval.
interval: 60s
# Configure the limit for the number of alerts.
# A value of '0' means no limit.
# Optional.
# Default value: '0'
limit: 0
# Configure log-based alert rules.
alertRules:
# Define an alert name.
- alert: my-log-based-alert
# Define the LogQL expression to evaluate for this rule.
expr: rate({service_name="bob-service"} [1m])
# The duration in seconds before an alert transitions from pending to open.
# Optional.
# Default value: '0s'
for: 0s
# Define labels to add or overwrite.
# Map of key-value pairs.
# Required labels:
# severity: [error, critical, warning, info]
# code:
# resource: component/service/hardware related to the alert
# Additional labels are optional.
labels:
severity: warning
code: 202
resource: AIS
another-label: another-value
# Define annotations to add.
# Map of key-value pairs.
# Optional.
# Recommended annotations:
# message: value of the Message field in the user interface.
# expression: value of the Rule field in the user interface.
# runbookurl: URL of the Actions to take field in the user interface.
annotations:
message: my-alert-message
Replace the following:
PROJECT_NAMESPACE
: your project namespace.LOGGING_RULE_NAME
: the name of theLoggingRule
definition file.