Managing incidents for log-based alerts

An incident is a record of the triggering of an alerting policy. Cloud Monitoring opens an incident when a condition of an alerting policy has been met.

When your log-based alerting policy is first triggered by a matching log entry, Monitoring opens an incident and sends you a notification.

This page describes how you can view, investigate, and manage incidents for log-based alerting policies.

Finding incidents

To see a list of incidents, do the following:

  1. In the Cloud Console toolbar, click  Navigation menu, and then select Monitoring:

    Go to Monitoring

  2. In the Monitoring navigation pane, select  Alerting:

    • The Summary pane lists the number of open incidents.
    • The Incidents pane displays the most recent incidents. To hide closed incidents in the table, click Hide closed incidents.

Finding older incidents

The Incidents pane on the Alerting page shows the most recent open incidents. To locate older incidents, do one of the following:

  • To page through the entries in the Incidents table, click  Newer or  Older.

  • To navigate to the Incidents page, click See all incidents. From the Incidents page, you can do all of the following:

    • Hide closed incidents: To list only open incidents in the table, click Hide closed incidents.
    • Filter incidents: For information about adding filters, see Filtering incidents.
    • Acknowledge or close an incident: To access these options, click  More options in the incident's row, and make a selection from the menu. For more information, see Managing incidents.

Filtering incidents

When you enter a value on the filter bar, only incidents that match the filter are listed in the Incidents table. If you add multiple filters, then an incident is displayed only if it satisfies all of the filters.

To add a filter the table of incidents, do the following:

  1. On the Incidents page, click  Filter table and then select a filter property. Filter properties include all of the following:

    • State of the incident
    • Name of the alerting policy
    • When the incident was opened or closed
  2. Select a value from the secondary menu or enter a value in the filter bar.

Investigating incidents

To view the details of an incident, you must have, at a minimum, the Identity and Access Management role of roles/monitoring.viewer. For more information, see Unable to view incident details due to a permission error.

After you have found the incident you want to investigate, go to the the Incident details page for that incident. To view the details, click on the incident summary in the table of incidents on either the Alerting page or the Incidents page.

Alternately, if you received a notification that includes a link to the incident, then click that link to view the incident details.

The following screenshot shows the details page for an incident:

The details page provides summary information and investigative tools for
an incident.

The Incident details page provides the following information:

  • Status information, including:

    • Name: The name of the alerting policy that caused this incident.
    • Status: The status of the incident: open, acknowledged, or closed.
    • Duration: The length of time for which the incident was open.
  • Information about the alerting policy that caused the incident:

    • Condition: The condition in the alerting policy that caused the incident. For log-based alerting policies creating by using the Logs Explorer, the condition is always "Log match condition."
    • Message: A brief explanation of the cause based on the configuration of the condition in the alerting policy. This pane is always populated. For log-based alerting policies, the message always has the structure "Log matching query-filter with labels {} alert started."
    • Documentation: The (optional) documentation for notifications provided when the alerting policy was created. This information might include a description of what the alerting policy monitors and include tips for mitigation. If you skipped this field when creating the alerting policy, then the text in this pane is "No documentation is configured."
    • Labels: The labels and values for the monitored resource, included in the log entry that triggered the alerting policy. This information can help you identify the specific monitored resource that caused the incident.

    • An information box that reminds you that the incident is from a log-based alert. This box always includes the text "This incident was generated directly from log messages."

    The Incidents details page also provides tools for investigating the incident:

    • Links to other troubleshooting tools. The configuration of your project and alerting policy and the age of the incident determine which links are available.
      • To edit the definition of the alerting policy, click Edit policy.
      • To see related log entries in Logs Explorer, click View logs. For more information, see Using Logs Explorer.
    • Annotations: Provides a log of your findings, results, suggestions, or other comments from your investigation of the incident.
      • To add an annotation, enter text in the field and click Add comment.
      • To discard the comment, click Cancel.

    You can also acknowledge or close incidents from the Incident details page. For more information, see Managing incidents.

    Managing incidents

    Incidents are in one of the following states:

    •  Open: The log-based alerting policy was triggered, and the incident is still open. If the same alert is triggered again and there is already an incident open, then a new incident isn't opened.

    •  Acknowledged: The incident is open and has manually been marked as acknowledged. Typically, this status indicates that the incident is being investigated.

    •  Closed: You have manually closed the incident, or it was automatically closed after 7 days.

    To manage incidents, your role must include the monitoring.alertPolicy.create or monitoring.alertPolicy.update permission. These permissions are included in the Monitoring Editor role, roles/monitoring.editor. For detailed information about roles and permissions, see Access control: Predefined roles.

    Acknowledging incidents

    We recommend that you mark an incident as acknowledged when you begin investigating the cause of the incident.

    To mark an incident as acknowledged, do the following:

    • In the Incidents pane of the Alerting dashboard, click See all incidents.
    • On the Incidents page, find the incident that you want to acknowledge, and then do one of the following:

      • Click  More options and then select Acknowledge.
      • Open the details page for the incident and then click Acknowledge incident.

Closing incidents

To close an incident, do the following:

  1. In the Incidents pane of the Alerting dashboard, click See all incidents.
  2. On the Incidents page, find the incident that you want to close, and then do one of the following:

    • Click  More options and then select Close this incident.
    • Open the details page for the incident and then click Close incident.
If you see the message Unable to close incident, try again in a few minutes. You have to wait 3 minutes before you can close a new incident; you can't close a new incident immediately.