Incidents and events

An event occurs when the conditions for an alerting policy are violated. When an event occurs, Cloud Monitoring opens an incident. To view a list of incidents and events, do the following:

  1. In the Cloud Console toolbar, click  Navigation menu, and then select Monitoring:

    Go to Monitoring

  2. In the Monitoring navigation pane, select  Alerting.

Incidents

In the Alerting window, the Summary pane lists the number of incidents while the Incidents pane displays the 10 most recent incidents. Each incident is in one of three states:

  •  Open: The policy's set of conditions are being met or there is no data to indicate that the condition is no longer met. If a policy contains multiple conditions, then incidents are opened depending on how those conditions are combined. See Combining conditions for more information.

  •  Acknowledged: The incident is open and has manually been marked as acknowledged. Typically, this status indicates that the incident is being investigated.

  •  Closed: The system observed that the condition stopped being met or 7 days passed without an observation that the condition continued to be met. When an incident is closed due to the passage of time, that incident is considered to have "expired" and it might be labeled this way in notifications or in user interfaces to differentiate from the case where the condition is known to no longer hold.

Acknowledging incidents

To mark an incident as acknowledged, do the following:

  • In the Incidents pane of the Alerting dashboard, click See all incidents. This opens the Incidents window.
  • To acknowledge an incident, do one of the following:

    • For the incident that you want to acknowledge, select More options and then select Acknowledge.
    • Open the details page for the incident you want to acknowledge, and then click Acknowledge incident.

You must have the Monitoring Editor role, roles/monitoring.editor, to acknowledge incidents; for more information, see Access control: Predefined roles.

Silencing conditions

If you silence a condition, then all open incidents with that condition are silenced and you won't receive an alert notification when the condition stops being met. Silencing a condition removes the incident from the active incidents display. If you are investigating an incident, you should acknowledge that incident instead of silencing it.

Silencing an incident doesn't reconcile the underlying cause for the incident. That is, if the condition that generated the incident continues to be met on the next alerting cycle, the incident is re-opened.

To silence a condition, do the following:

  • In the Incidents pane of the Alerting dashboard, click See all incidents. This opens the Incidents window.
  • For the incident that you want to acknowledge, select More options and then select Silence associated condition.

Closing incidents

Incidents are closed automatically; you cannot close an incident. An incident is closed when the system observed that the condition is no longer being met or when 7 days have passed without an observation that the condition is still being met.

For example, assume you have an alerting policy that is configured to generate an incident if the HTTP latency is above 2 seconds for 10 consecutive minutes, and that an incident was created. If the next measurement of the HTTP latency is equal to or below 2 seconds, then the incident is closed. Similarly, if no data at all is received for 7 days, then the incident is closed.

Viewing and filtering incidents

The Incidents window, by default, displays open and acknowledged incidents. To view closed incidents, click Show closed incidents.

To control which incidents you see, add filters. To add a filter, do the following:

  1. Click  Filter table and then select a filtering attribute:

    • State
    • Alerting policy name
    • Metric type
    • Resource type
  2. Based on the attribute you select, a second menu opens and displays a partial list of options. If you enter a value on the filter bar, the list of options is modified to those options that contain the text you entered.

    For example, to filter on the metric container.googleapis.com/container/cpu/usage_time, you select the attribute of Metric. If you enter usage_time, you might see the following options in the secondary menu:

    agent.googleapis.com/cpu/usage_time
    compute.googleapis.com/guest/container/cpu/usage_time
    container.googleapis.com/container/cpu/usage_time

If you add multiple filters, an incident is displayed only if it satisfies all filters.

Inspecting events

The Events pane of the Alerting dashboard displays the most recent events and includes a graphical indicator:

Part of an events listing.

  • To view an events details, click the event name. The details window includes when the incident was opened, the duration, and the status.

  • To view all events, click See all events. This opens the Events window. All events are listed.

    • To page through the events, use the Forward and Backward buttons.
    • To filter the events, click Show filters. You use the filter dialog to select the types of activities, the resources, and the name. If you leave a field at the default value, then this field isn't considered.

      Display of the event filter dialog.

      For example, to show all activities that are open, select Opened in the Activity types menu, and leave all other fields at the default value.

The following table describes the graphical indicators:

Indicator Meaning
Maintenance message icon. Maintenance message.
Cloud account added message. Cloud event message.
Database backup, config, or maintenance message. Database backup, configuration, or maintenance message.
Violation acknowledged, closed, or opened message. Violation acknowledged (blue), closed (green), or opened (red) message.
Instance migration or pre-emption, or Kubernetes message. Instance was migrated or pre-empted message. Kubernetes setup failure, not ready, or disk space limitation message.

What's next

  • To create and manage alerting policies with the Cloud Monitoring API or from the command line, see Using the API.
  • For a detailed conceptual treatment of alerting policies, see Alerting policies in depth.