Incidents and events

Using the Google Cloud Console

An event occurs when the conditions for an alerting policy are violated. When an event occurs, Stackdriver Monitoring opens an incident. To view a list of incidents and events, do the following:

  1. In the Cloud Console, select Monitoring

    Go to Monitoring

  2. Select Alerting.

Incidents

The Incidents pane of the Alerting dashboard displays a partial list of incidents. The incidents are categorized under three tabs:

  • Open
  • Acknowledged
  • Closed

To view all incidents, click See all incidents. In the Incidents window, you can add filters to restrict the incidents that are displayed. To add a filter, click Filter table, and make a selection for the filter type. Based on the type you select, make a selection for the filter value or enter the value.

Open incidents

If an incident is Open, then the policy's set of conditions is currently being met or if there is no data to indicate that the condition is no longer met.

If a policy specifies multiple conditions, incidents open depending on how the conditions are combined. See Combining conditions for more information.

Acknowledged incidents

If an incident is listed in the Acknowledged tab if it is open and if the incident is marked as acknowledged. Marking an incident as acknowledge indicates that the incident is being investigated. Marking an incident as acknowledged doesn't change its state.

To mark an incident as acknowledged, go to the incident details and click Acknowledge incident. You must have the Monitoring Editor role, roles/monitoring.editor, to acknowledge incidents; for more information, see Access control: Predefined roles.

Resolved incidents

If an incident is Resolved state, then the policy's conditions are no longer met. An incident is listed as resolved if there is no data to indicate whether the condition still holds and the incident has expired.

For example, assume you have an alerting policy that is triggered if the HTTP latency is above 2 seconds for 10 consecutive minutes. If the alerting policy is triggered, an event occurs and an incident is created. If the next measurement of the HTTP latency is equal to or below 2 seconds, then the incident is resolved.

You cannot manually change the state of an incident to Resolved.

Inspecting events

The Events pane of the Alerting dashboard displays the most recent events and includes a graphical indicator:

Part of an events listing.

  • To view an events details, click the event name. The details window includes when the incident was opened, the duration, and the status.

  • To view all events, click See all events. This opens the Events window. All events are listed.

    • To page through the events, use the Forward and Backward buttons.
    • To filter the events, click Show filters. You use the filter dialog to select the types of activities, the resources, and the name. If you leave a field at the default value, then this field isn't considered.

      Display of the event filter dialog.

      For example, to show all activities that are open, select Opened in the Activity types menu, and leave all other fields at the default value.

The following table describes the graphical indicators:

Indicator Meaning
Maintenance message icon. Maintenance message.
Cloud account added message. Cloud event message.
Database backup, config, or maintenance message. Database backup, configuration, or maintenance message.
Violation acknowledged, resolved, or opened message. Violation acknowledged (blue), resolved (green), or opened (red) message.
Instance migration or pre-emption, or Kubernetes message. Instance was migrated or pre-empted message. Kubernetes setup failure, not ready, or disk space limitation message.

What's next

  • To create and manage alerting policies with the Stackdriver Monitoring API or from the command line, see Using the API.
  • For a detailed conceptual treatment of alerting policies, see Alerting policies in depth.

Using the Stackdriver Monitoring console

The Stackdriver Monitoring console provides information about the incidents and other events associated with alerting policies.

To view the incidents and events associated with an altering policy, navigate to the Policies page. From the Stackdriver Monitoring console, select Alerting > Policies overview.

Incidents

An incident is a record that an alerting policy has been triggered. When events trigger a policy, Stackdriver opens an incident. The incident remains open until the policy stops triggering.

To view incidents, select Alerting > Incidents on the Stackdriver Monitoring console. This brings up a tabbed view that provides lists of open, acknowledged, and resolved incidents:

Incident states

“Open” or “Resolved”

Incidents appear in the Stackdriver Monitoring console and can be in one of two states:

  • Open: A policy's set of conditions is currently being met, based on the available data1.

    If a policy specifies multiple conditions, incidents open depending on how the conditions are combined. See Combining conditions for more information.

  • Resolved: The policy's conditions are no longer met2. For example, Monitoring measured HTTP latency above two seconds for 10 consecutive minutes, but in its next measurement, latency is equal to or below two seconds. So the policy resolves the incident and resets the duration window.

    Currently, you cannot manually change the state of an incident to Resolved. The policy resolves the incident when its conditions are no longer met (either because updated measurements no longer meet the conditions or because someone changed the policy's conditions).

1An incident will also be listed as open if there is no data to indicate that the condition is no longer met.
2An incident will also be listed as resolved if there is no data to indicate whether the condition still holds and the incident has expired after 7 days of inactivity.

In the Stackdriver Monitoring console, acknowledged incidents that are still open appear on a separate tab. This lets other people who might be notified know that someone is looking at the issue. When an incident resolves, the Stackdriver Monitoring console moves it to the Resolved tab whether or not it was acknowledged.

Acknowledging incidents

After you view an open incident, you can mark it “acknowledged” to indicate that you have seen the report and it is being worked on. Acknowledgement doesn't resolve an open incident. You must have the Monitoring Editor role, roles/monitoring.editor, to acknowledge incidents; for more information, see Access control: Predefined roles.

To acknowledge an incident, click on an incident on the Alerting > Incidents page. This brings up the incident-report page:

Acknowledging an incident

The policy-violation box summarizes the incident and provides an Acknowledge button. Click the button to acknowledge the incident.

The rest of the report provides more details on the incident and provides links to other useful information, including related logs.

Inspecting events

To see a list of all the events associated with the resources in your Workspace, select Alerting > Events from the Stackdriver Monitoring console.

This brings up the Events page, which shows a history of events affecting the monitored resources under the Workspace. In addition to alerting incidents, this list also includes maintenance tasks and resource stops and starts.

The following screenshot shows part of an event listing:

Part of an events listing

This list be searched and filtered, for example, to restrict the list to a particular resource. You can also add custom events to the event record.

Was this page helpful? Let us know how we did:

Send feedback about...

Stackdriver Monitoring