AO alerting overview

This section describes the alerting features available for Application Operators (AO) in Google Distributed Cloud (GDC) air-gapped.

Alerting is an Observability service that gives timely awareness in GDC so you can resolve problems quickly. An alerting policy describes the circumstances to be alerted and how you want to receive notifications.

Metric-based alerting policies track collected system monitoring data and notify specific people when a resource meets a pre-established condition. For example, an alerting policy that monitors the CPU utilization of a virtual machine (VM) might notify an on-call team when an event activates the policy. Alternatively, a policy that monitors an uptime check might notify on-call and development teams.

To monitor recurring events in your logs over time, use log-based metrics to create alerting policies. Log-based metrics generate numerical data from system logging data. Log-based metrics are suitable when you want to do any of the following:

  • Count the message occurrences in your logs, like a warning or error. Also, receive a notification when the number of events crosses a threshold.
  • Observe trends in your data, like latency values in your logs. Also, receive a notification if the values change unacceptably.
  • Create charts to display the numeric data extracted from your logs.

The Observability platform of GDC collects incoming alerts and sends user notifications based on configuration and workflow rules for data observability. In GDC, alerts can generate pages and tickets for critical errors. Pages require immediate attention from an operator, while tickets are less urgent.