Managing an incident

This page shows you how to manage an existing incident using Stackdriver Incident Response and Management (IRM).

Getting started

Managing an existing incident can involve any of the following actions:

  • View and update information on the Overview, Investigate, and Related incidents tabs of the Incident Details view in IRM.
  • Perform frequent tasks: update the incident's severity, add roles, add tags, update the incident summary, and add investigation updates.
  • Hand off the incident, including creating a shift-handoff email.

To begin managing an incident, navigate to the Stackdriver Incident Response and Management dashboard. If you are prompted to choose a Workspace, select the appropriate one from the drop-down menu in the top navigation bar:

IRM

Click on an entry in the Active Incidents list. You see the Incident Details view, which has three tabs: Overview, Investigate, and Related Incidents. Each of these tabs presents a subset of information that is relevant to a particular role in the response team, and lets users add relevant information to the accumulating knowledge about the incident.

Shared information across Incidents tabs

In the Incident Details view, the Overview, Investigate and Related Incidents tabs each present some common information:

  • Title (editable): Descriptive title of the incident; the default value is the title of the first alert associated with the incident.
  • Elapsed time (read only): Number of days since the first associated alert fired.
  • Severity (editable): A classification to help the response team prioritize and understand an incident's context quickly. For details, go to severity classifications.
  • Stage (editable): The current stage of the incident's lifecycle. For details, go to incident stages.
  • Comms (editable): The URL of the primary channel where conversations about the incident are taking place.

IRM dashboard

The three tabs each organize the information and activities of the response team, according to their designated IRM roles.

As your response team goes through the incident-response process, use these tabs to add details to the incident. Collecting information as you go has been a proven best practice at Google. For example, these details can help:

  • Provide better artifacts and communication when a number of responders are involved.
  • Scale to provide stakeholders real-time data.
  • Help ensure continuity when one response team hands off to another.

Overview tab

The Overview tab displays information that is usually managed by the Incident Commander and Communications Lead roles. On this tab, you can read and edit the following:

  • Summary of the incident.
  • Tags that help classify the incident; see Adding tags for instructions on how to assign tags.
  • Links to supporting external information. For example, your issues queue, a Jira issue, or instructions for using new communication channels.
  • Roles assigned to the incident's response team members; go to View and assign roles for details.
  • Subscriptions that notify subscribers when an incident's data is updated.
  • Escalate button that lets you elevate the visibility of an incident in the IRM dashboard.

Overview tab in the Incident Details view

Investigate tab

The Investigate tab displays information that is usually managed by the Operations Lead role. This tab presents the following information:

  • Alerts: This pane contains two tabs:

    • Available alerts list: Alerts that haven't been added to an incident. It is a best practice to add alerts to incidents promptly, keeping this list as short as possible.
    • Added alerts list: Alerts that have been added to an incident. You can add multiple alerts to an incident.
  • Investigation updates: Notes on key observations, actions, and milestones during your investigation.

Investigate tab in the Incident Details view

The Related incidents tab displays information of interest to the Operations Lead role. Depending on the incident, this tab can feature two tables:

  • Similar incidents: Incidents that have things with common with the current incident, such as the underlying alerting policy. This information might provide historic context on the last response on the past occurrence of the current situation.
  • Duplicate incidents: Incidents designated as a duplicate by the user.

To navigate to a related incident, select a row from the Related incidents table.

Mark a duplicate incident

If you think an incident duplicates another incident, you can mark it as a duplicate so that it merges with the primary (authoritative) incident.

To mark an incident as a duplicate, from any tab in the Incident Details view, do the following:

  1. Expand the More menu and select Mark incident as duplicate:

    Mark incident duplicated

    A dialog appears.

  2. Select the primary incident.

  3. Click Mark as duplicate.

    Duplicate incidents are available on the Related incidents tab.

When an incident is marked as a duplicate, some of its data is reassigned to the primary incident; this includes tags and links. Investigation updates from the duplicate incident are displayed in line with the primary incident's updates and annotated with "From duplicate".

To remove a duplicate incident from the primary incident, do the following:

  1. Click on the incident in the Duplicate incidents list.

  2. Expand the More menu.

  3. Select Detach from primary incident.

Upon deduplication, the detached incident's stage becomes Triaged, and thus it displays in the Active incidents list. Tags and links copied to the primary incident remain with the primary incident.

Frequent actions

During the lifecycle of an incident, the team frequently updates some of the incident's fields from the Incident Details view. Fields that are updated multiple times include the following:

  • Update the incident Summary (Overview tab) to maintain an up-to-date description of the incident, so that stakeholders, new responders, and observers with access to the Workspace can quickly understand the current state and focus on the incident investigation.

  • Add Investigation updates (Investigate tab): Keep track of investigations, hypotheses, or other relevant information. Investigation updates contribute key information to the timeline of the incident, which is useful in postmortems for the incident.

  • Update the Severity classification (all three tabs): Communicate the user impact of this incident to response team members and observers, and for reporting and analysis. We recommend that you change the severity as conditions change and as the situation is better understood.

  • Add Tags (Overview tab): Tags provide a concise communication medium for reporting and future reference in recurrences of similar incidents. IRM provides a set of system tags to help ensure data uniformity.

Basic workflow

After an incident has been created, the basic workflow for managing an incident is as follows.

Assign a severity

Assign a severity to move the incident to the Triaged stage. You can assign a severity using the drop-down menu in the incident's toolbar:

Severity menu

You can, and should, change the severity of an incident at any point as needed.

Assign a primary communication channel

So that the response team can consolidate conversations about the incident, designate the primary channel for communication outside of IRM console. You can assign this channel using the drop-down menu in the incident's toolbar:

Primary communications menu

Enter the URL of the primary channel where conversations about the incident are taking place. You can also change the channel's display name.

Update the incident summary

So that the response team can quickly understand the status of the incidents, update the incident summary. A useful incident summary typically contains the impact, investigation status, and the next scheduled update. To update the incident summary, click Edit summary from the Overview tab.

Edit summary

Add tags

You can add tags to an incident from the Overview tab. Tags are freeform text strings that let you classify and group incidents, for searching using IRM. Tags must be:

  • Unique
  • Less than 100 characters in length (for the entire string)
  • Comprised only of the following characters: 0-9, A-Z, a-z, :, -, _, and =.

The IRM tool provides two predefined top-level tags for you: action and cause. The character : is the conventional hierarchy delimiter. For example, cause:dependency:foo-service:

Add tags in Overview tab

As you type in the Tags field, the input bar suggests tag names that are either relevant to your incident or that have been used in the past within this Workspace.

To remove individual tags (which are visible on the Overview tab), click the X on the tag chip.

When investigating an incident, you might want to link to relevant artifacts or data that are not in IRM.

To associate external information with an incident, do the following:

  1. Select an incident from the Active incidents section of the IRM dashboard. This takes you to the Incident Details view.
  2. Select the Overview tab.
  3. In the Links pane, click Add link. A dialog appears.
  4. In the Link type field, select URL.
  5. In the URL field, provide a valid URL.

    Note to Jira users: If you provide a URL that that contains the string "jira.atlassian.com", the Link type automatically changes to Jira. You can change it back to URL if you wish. You can also manually set any URL to the Jira link type in this step.

  6. Optional: You can provide a label for your URL in the Display name field.

  7. Click Save. Your link is displayed in the Links pane.

To delete a link, expand the More menu any row in the Links pane and select Delete.

Escalate an incident

Upon review, if you think an incident is important enough to warrant quicker response and greater visibility by stakeholders, you can escalate an incident.

To escalate an incident, do the following:

  1. Select an incident from the Active incidents section of the IRM dashboard. This takes you to the Incident Details view.
  2. Select the Overview tab.
  3. In the Escalating the incident pane, select Escalate. A dialog appears. The Incident title is always prepopulated in the dialog; if the Incident severity or Incident summary have been set previously, these are also prepopulated.
  4. Select the appropriate preset from Apply preset.

    If no presets are available for the Workspace, a default None value is preselected for you. For information on creating a preset, go to Set up incident presets.

  5. Set or adjust the Incident severity (required field). For information on the possible values, review severity classifications.

  6. Optional: Edit the Incident title and Incident summary.

  7. Optional: In the Advanced fields section, unselect any fields that you don't want to apply to the incident.

  8. Click Escalate to save your edits. You see a red-triangle icon by the newly escalated incident's title, as well as any changes to the other fields.

You cannot deescalate an incident once it has been escalated. However, when you've determined that an incident no longer requires an active response, you can mark the incident Resolved by updating the Stage drop-down menu. Now the incident appears in the Resolved incidents list, and won't command your attention in the Active incidents list.

Assign and transfer roles

You can view and assign roles to people working on an incident within IRM:

  • Primary Responder
  • Secondary Responder
  • Operations Lead (OL)
  • Incident Commander (IC)
  • Communications Lead (CL)
  • External Customer Communications Lead
  • Other (you can specify your own role name)

Roles identify who is working on the issue. The specified person receives an email notifying them that they've been assigned a role. If you assign yourself to a role, you don't receive an email.

To assign a role, from the Overview tab of the Incident Details view, do the following:

  1. In the Roles pane, select Add role.
  2. Select a Role type and fill in an Assignee email address. The email address must already have access to the Workspace.
  3. Click Add role.

Incident roles

To delete a role from the incident, from the Roles pane, do the following:

  1. Select the role you want to delete and expand its More menu.
  2. Click Delete.

To transfer a role from one person to another, from the Roles pane, do the following:

  1. Select the role you want to transfer and expand its More menu.
  2. Select Transfer. A dialog appears.
  3. Fill in an email address in New assignee. The email address must already have access to the Workspace.
  4. Click Transfer.

Add incident subscriptions

You can create subscriptions for an incident, so that subscribers are notified when an incident's data is updated.

To add a subscription, from the Overview tab of the Incident Details view:

  1. In the Subscriptions pane, select Add subscription.
  2. Select a Channel type, either email or Slack.
  3. Fill in an Email address. The email address does not need access to the Workspace.
  4. Select one or more event types from the Notify on checklist.
  5. Click Add subscription.

Incident subscriptions

To view or edit all subscriptions for the incident (in a matrix format):

  1. In the Subscriptions pane, expand the More menu and select Manage.
  2. In the Manage subscriptions dialog, use the checkboxes to update notification events per channel.

    To delete a subscription, click the X at the end of the subscription's row.

  3. Click Update.

Add investigation updates to an incident

You can add Investigation updates to note key observations, actions, and milestones during your investigation. Add Investigation updates from the Investigate tab of the Incident Details view.

You can add investigation updates either in markdown (default), or plain text if you unselect Use markdown. Below, you have the option to see a preview of the markdown content.

Caution: Once you submit an update, you cannot edit or delete that update.

Add alerts to an incident

IRM relies on users to add alerts to relevant incidents. Adding alerts helps scope the impact of an incident, providing situational awareness for your response team (both now and later, during the postmortem or future incidents).

You can add alerts to an incident from the Available alerts table on the Investigate tab; click on Add at the end of the alert's row. To add multiple alerts to an incident, select one or more alerts using the checkboxes and click Add selected.

When you add an alert, it's added to the incident and visible under Added alerts. You can add multiple alerts to an incident.

Grouping related alerts into a single incident focuses data collection, interactions, and visibility as your response team responds to an incident. Incidents involving multiple services (say a frontend, backend, and a common dependency) may want to have two incidents (one for frontend+backend, one for the common dependency) because the mitigation/response actions/responder-groups are different.

Hand off a shift

If you are handing off your responsibilities to other responders, complete the following steps to ensure continued responsibility:

  1. Find replacement staff to take on the incident.
  2. Assign and transfer roles as needed.
  3. Update the incident summary.
  4. Create a shift-handoff email.

Create a shift-handoff email

Shift-handoff emails help responders communicate the latest status of recent incidents when changing on-call duties. To create a shift-handoff email from the IRM dashboard:

  1. Click Create handoff email in the Incidents list toolbar :

    Shift-handoff button

  2. You see the Shift handoff email page, which includes suggested values for recipients and subject, and lets you write your message and associate relevant incidents. You can edit any of these values before you send the email.

  3. Select one or more incidents from the Active Incidents list to include in the email.

    IRM preselects all incidents that are in the Detected or Triaged stages, as well as all incidents that have been created in the past 24 hours. Check and adjust the selection as needed; you can use the Clear selection and Restore preselection buttons, or the checkboxes in each incident's row in the list.

  4. Click Preview in the bottom right-hand corner. This generates a preview of the content before you send the email.

  5. When you are satisfied with your email draft, click Send. You see a confirmation message Shift handoff sent when your mail sends successfully.

What's next

To learn how to resolve an incident in IRM, read Resolving an incident.

For descriptions of concepts that you might find useful as you begin using IRM, review Concepts.

Send feedback about...

Stackdriver Incident Response and Management (IRM) Documentation
Need help? Visit our support page.