Close

Incident management for high-velocity teams

The 7 stages of effective incident response

In the midst of daily operations, an IT leader suddenly receives a barrage of alerts — a service outage threatens to disrupt their system. However the seasoned incident management team has faced similar challenges before and swiftly springs into action. By following a well-rehearsed plan and incident response best practices, they coordinate to mitigate the issue, limit damage, and restore operations, averting customer impact.

Incident response should not be reactionary but a well-defined series of practices and processes that you implement when unforeseen events occur. By understanding the structured incident response lifecycle, companies gain guidance through a strategic framework to swiftly identify, react to, and neutralize disruptions or security threats, ensuring a prompt return to normal operations.

This guide will cover the incident response lifecycle and its phases, the types of security incidents, and essential tools for effective incident management. Additionally, it will address key team members, potential challenges, and insights to streamline and fortify incident response strategies.

What is incident response?

Incident response is an organization’s process of reacting to IT threats such as cyberattack, security breach, and server downtime.

Other IT Ops and DevOps teams may refer to the practice as major incident management or simply incident management.

The following sections describe an incident response process, what to do between realizing a service is down and getting it up and running again, based on the material in our own Incident Handbook.

In this article we’ll cover the seven key stages of incident response:

  1. Detect the incident
  2. Set up team communication channels
  3. Assess the impact and apply a severity level
  4. Communicate with customers
  5. Escalate to the right responders
  6. Delegate incident response roles
  7. Resolve the incident

How does incident response work?

Incident response is an organization’s process of reacting to IT threats such as cyberattack, security breach, and server downtime.

Other IT Ops and DevOps teams may refer to the practice as major incident management or simply incident management.

The following sections describe an incident response process, what to do between realizing a service is down and getting it up and running again, based on the material in our own Incident Handbook.

In this article we’ll cover the seven key stages of incident response:

  1. Detect the incident
  2. Set up team communication channels
  3. Assess the impact and apply a severity level
  4. Communicate with customers
  5. Escalate to the right responders
  6. Delegate incident response roles
  7. Resolve the incident
Incident response workflow

Detect the incident

Ideally, monitoring and alerting tools will detect and inform your team about an incident before your customers even notice. Though sometimes you'll first learn about an incident from Twitter or customer support tickets.

No matter how the incident is detected, your first step should be to record that a new incident is open in a tool for tracking incidents. In an incident management solution such as Jira Service Management, alerting and communication is integrated with your tracking tool.

Set up team communication channels

One of the first things the incident manager (IM) does when they come online is set up the incident team's communication channels. The goal at this point is to establish and focus all incident team communications in well-known places, such as:

  • Chat room in Slack or another messaging service.
  • Video chat in a conferencing app like Zoom (or if you're all in the same place, gather the team in a physical room).

We prefer using both video chat and a text chat tool during incidents, since both excel at different things. Video chat is great for creating a shared mental picture of the incident quickly through group discussion. And Slack helps generate a timestamped record of the incident, along with collected links to screenshots, URLs, and dashboards.

Slack and most other chat tools allow users to set a room topic. The incident manager should use this field for information about the incident and useful links.

Finally, the IM sets their own personal chat status to the issue key of the incident they are managing. This lets their colleagues know that they're busy managing an incident.

Preparation

Preparation is the core of an incident response plan and determines a company’s responsiveness to an attack. A well-documented pre-incident process facilitates smooth navigation through intense, high-stress scenarios.

Any company will be more resilient with a robust incident response process based on the Atlassian Incident Handbook.

Identification

This phase involves detecting and verifying incidents through error messages, log files, and monitoring tools. Incidents might be identified through social media or customer support tickets, requiring the response team to manually record the incident in an incident-tracking tool.

Tools like Jira Service Management centralize all alerts and incoming signals from your monitoring, service desk, and logging applications, making it easy to categorize and prioritize issues.

Containment

Once you detect an incident, containment helps prevent further damage. During containment, the response team aims to minimize the scope and effects of an incident.

Eradication

Following containment, the primary focus shifts to removing threats from the company’s network or system. This phase involves a meticulous cleansing of all systems, removing any lingering malicious content to minimize the risk of potential reinfection.

Companies start restoring normal operations by conducting a comprehensive investigation and successfully eliminating threats.

Recovery

After eradicating the threats, the team focuses on restoring the affected systems to their pre-incident state. Data recovery and system restoration are vital for minimizing further losses and ensuring smooth operations.

Lessons learned

Incident debriefings are crucial to refining incident response strategies. The team reviews documentation, evaluates performance, and implements change to enhance incident handling efficiency. Every incident is a learning opportunity for the incident response team.

Tools for effective incident response

Teams need specialized tools, such as security information & event management (SIEM) systems, intrusion detection systems (IDS), forensic tools, and communication platforms, for streamlined incident response processes. 

Tools like Jira Service Management play a critical role in reducing resolution time and negative impacts. They automatically limit noise and surface the most crucial issues to the right team using powerful routing rules and multiple communication channels. 

Assess the impact and apply a severity level

After the incident team's communication channels are set up, it's time to assess the incident so the team can decide what to tell people about it and who needs to fix it.

We have the following set of questions that IMs ask their teams:

  • What is the impact to customers (internal or external)?
  • What are customers seeing?
  • How many customers are affected (some, all)?
  • When did it start?
  • How many support cases have customers opened?
  • Are there other factors, e.g. Twitter, security, or data loss?

The next step typically is to assign a severity level.

Incident response: Frequently asked questions

Why is incident response important?

A well-structured incident response plan minimizes incident impacts, enabling businesses to act swiftly and efficiently against threats. It reduces recovery time, financial loss, and reputational damage.

Who should be on an incident response team?

The incident response team should be diverse and include various roles and responsibilities. The team should include the incident commander, technical leads, communications managers, customer support leads, subject matter experts, social media leads, and problem managers. Executives and leaders across multiple domains within the company should coordinate the team.

What are some challenges of incident response?

Incident response teams often face an array of challenges, from resource constraints to issues with context, prioritization, communication, collaboration, stakeholder visibility, and the occasional human error. Preparedness is crucial to anticipate and tackle these challenges effectively. For example, involving the legal team in the preparation stage can mitigate potential legal or regulatory hurdles.