Responding to an alert

An incident response begins when you are alerted to an issue that requires action. If you are new to Stackdriver Incident Response and Management (IRM), this page helps you understand how to respond to an alert and then manage it through the incident lifecycle in IRM, using Google best practices.

For detailed guidance on managing an incident, go to Managing an incident.

In this guide, you use IRM to create an incident to assess and mitigate an issue by doing the following:

  • Navigate from a notification to an alert in IRM.
  • View the details of an alert in IRM.
  • Create an incident from the alert in IRM.
  • Track your investigation by updating the incident.
  • Mark an incident Mitigated, then Resolved.

Before you begin

The Workspace you use to access the alpha release of IRM must be whitelisted. To whitelist your Workspace, submit the sign-up form.

This guide assumes that your whitelisted Workspace has received notification of a potential problem.

If you are creating a new project for IRM, your project must be part of a whitelisted Workspace and have alerting policies in place. Go to Setting up a Workspace for information on how to set up a new project that meets these requirements.

Responding to a new incident

After you receive and acknowledge an alert notification through your normal channel (for instance, email, Slack, or PagerDuty), the basic workflow for responding to an incident is as follows:

  1. Click the View Details link in the alert notification. This takes you to the Alert Details view in the IRM tool.

  2. Review the chart, alert details, and insights on the Alert Details view to determine an initial diagnosis of the situation.

  3. This is an "Available alert"; it isn't yet related to any ongoing incident. To start the creation of a new incident, click Take action > New incident:

    IRM

    This jumps you to the bottom of the page.

  4. Under Investigation updates, enter a brief update of your assessment, like "Looks like this is affecting foo service, looking through logs."

    You can add more updates as the investigation unfolds.

  5. To finish creating the incident, click Continue investigation. This takes you to the Incident Details view.

  6. Using the Severity drop-down list, select Medium:

    IRM

    Recording an initial severity assessment moves the incident into the Triaged stage and helps your response team prioritize and understand the incident more quickly. You can and should update the severity as the situation changes.

  7. Use the Incident Details view to do the following:

    • Add another Investigation update (on the Investigate tab) and note your actions taken as you go.
    • Escalate the incident using the Escalate button on the Overview tab. This opens a dialog box where you can update the incident summary and add stakeholders as subscribers to the incident.
    • Add Roles (on the Overview tab) for additional responders.
  8. Change the incident Stage to Mitigated to indicate that the incident no longer impacts end users:

    IRM

  9. Change the incident Stage to Resolved to indicate that the incident no longer requires an active response.

Next Steps

For detailed guidance on managing an incident, go to Managing an incident.

Send feedback about...

Stackdriver Incident Response and Management (IRM) Documentation
Need help? Visit our support page.