Learn

October 25, 2024

14 Minute Read

What Is Incident Response?

By Chrissy Kidd, Stephen Watts

Incident response (IR) is the set of strategic and organized actions an organization takes in the immediate aftermath of a cyberattack or security breach. The ultimate goal of your incident response actions is to reduce the risk of future incidents. As such, incident response plans aim to:

Swiftly identify the attack or incident.
Mitigate the impact of the incident.
Contain the damage.
Address the root cause.

IR involves planning, preparation, detection, containment, recovery, and remediation efforts to safeguard your organization's digital assets and minimize the adverse consequences of cybersecurity incidents.

In this article, we'll take a look at the ins and outs of incident response, including:

Frameworks for responding to incidents.
The relationship of incident response and incident management.
Tools and solutions that modern SOCs rely on for IR.

What are "security incidents"?

In the realm of cybersecurity, various incidents can pose threats to an organization's network, potentially leading to unauthorized intrusions: people are getting into your network, and they should not be there. These incidents vary in their methods, intentions, and potential consequences, and they demand vigilance and robust security measures.

Understanding and preparing for these types of security incidents is crucial for organizations seeking to protect their digital assets and maintain the security and integrity of their networks. It's important to:

Implement robust security measures.
Conduct regular risk assessments.
Define an incident response plan to mitigate the impact of these incidents.

Defining incident response

NIST defines incident response as "the mitigation of violations of security policies and recommended practices." Translated: how you shut down or minimize the severity of any violations of your security posture.

Incident response vs. incident management: What's the association?

The terms "incident response" and "incident management" are often used interchangeably. While they're certainly related, they are distinct practices that complement each other. Incident response and incident management both live under the umbrella of “what to do when an incident occurs,” but they have a different scope:

Incident response focuses on the immediate actions to contain, eradicate, and recover from an incident. The goals of incident response are to:

Minimize the impact of the incident.
Protect sensitive data and components.
Return to status quo as quickly as possible.

Incident management focuses on the processes that support incident response and the aftermath of the incident. This involves classification and prioritization of an incident, collaboration between different stakeholders and teams, documentation, reporting, and deriving insights from the information to prevent the incident from repeating.

How does shared responsibility factor into incident response?

Incident response is not completely the responsibility of the security or incident response teams. In fact, in cloud environments and across distributed systems, the shared responsibility model plays a crucial role in ensuring effective incident response. Here's what the shared responsibility model often means (though of course, each vendor contract may differ):

Cloud service providers (CSPs) are generally responsible for the security of the cloud infrastructure. This includes the physical hardware, network, and virtualization technology. They should have an effective incident response plan and should promptly notify their customers in case of an incident and provide updates on the situation.
The organization dealing with customer data and providing products or services to their customers (your organization) is responsible for the security of their data and applications that are hosted in the cloud environment. This includes implementing strong access controls, patching vulnerabilities, and regularly backing up data.

Incident response can be difficult when there’s the involvement of disparate departments. Different departments may have varying priorities, workflows, security awareness, and technical proficiency. Organizations can smooth the incident response process by establishing cross-departmental IR teams that follow a centralized plan. Clearly defining roles and responsibilities and using clear communication are essential.

Why organizations need strong incident response

Incident response is critically important for organizations for a variety of reasons, detailed below.

Minimizing cybersecurity threats. Organizations face a constant and evolving threat from cyberattacks and security breaches. These threats can result in:

Data breaches
Financial losses
Damage to reputation
Legal or regulatory consequences

Incident response helps organizations prepare for, respond to, and recover from these threats effectively. (More on threat incidents later.)

Minimizing damage to internal systems and data. The quicker an organization can respond to a cybersecurity incident, the less damage it's likely to suffer. Incident response aims to identify and mitigate the impact of incidents promptly, reducing potential financial losses and operational disruption.

Protecting data and assets. If not managed effectively, incidents can result in the loss or theft of sensitive data and intellectual property. Incident response measures help protect an organization's critical assets and ensure data confidentiality, integrity, and availability.

Managing your reputation. Public perception of an organization can be significantly impacted by how it responds to a cybersecurity incident:

A well-executed incident response can help maintain or even enhance an organization's reputation.
A poorly managed incident can lead to public distrust and reputational damage.

Stakeholder trust. Customers, partners, investors, and other stakeholders expect organizations to safeguard their data and assets. Demonstrating a commitment to incident response and cybersecurity can build trust and confidence among these groups.

Legal and regulatory compliance. Many industries and jurisdictions have specific legal and regulatory requirements for incident reporting and handling. Non-compliance can lead to legal consequences, fines, and other penalties. Incident response helps organizations meet these obligations.

Operational continuity. Effective incident response can minimize disruptions to an organization's operations. By quickly identifying and containing threats, incident response helps maintain business continuity and ensures that daily operations continue as smoothly as possible.

Risk mitigation. Incident response planning includes risk assessments, helping organizations identify vulnerabilities and weaknesses. By understanding these risks, organizations can take proactive steps to prevent incidents and reduce their likelihood.

Continuous improvement. Incident response is an iterative process. Each incident provides an opportunity to learn and improve response strategies, making the organization more resilient and better prepared for future incidents.

Splunk ITSI is an Industry Leader in AIOps

Splunk IT Service Intelligence (ITSI) is an AIOps, analytics and IT management solution that helps teams predict incidents before they impact customers.

Using AI and machine learning, ITSI correlates data collected from monitoring sources and delivers a single live view of relevant IT and business services, reducing alert noise and proactively preventing outages.

Learn more about Splunk ITSI ›

Common cyber incidents

Let's look at some common types of cybersecurity incidents and security breaches. For more, learn about the most common cyber threats.

Unauthorized attempts to access systems or data

Unauthorized access incidents occur when an individual or a group attempts to infiltrate an organization's systems or access its data without permission. Examples include:

Hacking attempts, where attackers employ various techniques to breach defenses. (Some hacking is good and intentional.)
Brute force attacks involve trying numerous combinations of passwords to gain entry.
Social engineering is a manipulation tactic aimed at tricking individuals into revealing sensitive information

Privilege escalation attacks

Privilege escalation incidents involve an attacker:

Gaining access to a system with limited permissions.
Next, exploiting vulnerabilities or utilizing stolen credentials to acquire higher-level privileges.

This can result in unauthorized access to critical resources and data, posing a significant risk to an organization's security.

(Related reading: principle of least privilege.)

Insider threats

Insider threat incidents occur when anyone with access privileges — a current or former employee, contractor, or some other individual — misuses their access for malicious purposes. Examples of insider threats include:

Stealing sensitive information, whether intentional or unintentional.
Intentionally damaging systems
Engaging in acts of sabotage that can have severe consequences.

Insider threat diagram

(Image source.)

Phishing attacks

Phishing incidents involve attackers sending deceptive emails or messages that appear to originate from legitimate sources but are, in reality, clever traps.

The primary objective of phishing is to deceive recipients into divulging sensitive information or to spread malware through malicious attachments or links.

Malware attacks

Malware incidents involve the use of malicious software, such as viruses or Trojan horses, to compromise an organization's systems or data.

Different types of malware serve various purposes, from gaining unauthorized access to systems to disrupting normal operations. For instance, ransomware encrypts data and demands a ransom, usually money, for its release.

(Related reading: malware detection.)

Denial-of-service (DoS) attacks

A DoS incident occurs when an attacker floods a system or network with excessive traffic, rendering it unavailable to legitimate users.

The intention is to disrupt operations and services, causing inconvenience or financial harm to the organization.

(Related reading: DDoS, distributed denial-of-service attacks.)

Man-in-the-Middle (MitM) attacks

In a MitM incident, aka on-path attacks, an attacker intercepts and potentially alters the communication between two parties without their knowledge. This can happen as easily as someone eavesdropping a conversation between you and a colleague.

Attackers can steal sensitive information or, when online, inject malicious content into the communication, compromising the confidentiality and integrity of data.

Man in the middle attack diagram

Advanced persistent threats (APTs)

APTs are sophisticated and targeted attacks designed to gain access to an organization's systems or data.

These attacks are often orchestrated with the intention of stealing sensitive information or maintaining a long-term presence within the network, making them particularly challenging to detect and counter. Indeed, the average breach from an APT takes 150 days to be discovered.

Ransomware

Ransomware is a type of malicious software (malware) designed to encrypt a victim's files or lock them out of their computer system until a ransom is paid to the attacker. The ransom is typically demanded in cryptocurrency, such as Bitcoin, which provides a level of anonymity to the cybercriminals.

Ransomware attacks are a significant cybersecurity threat, and they can have devastating consequences for individuals, businesses, and organizations.

(Related reading: trends in ransomware.)

Incident response frameworks

Now that we understand incidents and the concept of IR, let's look at frameworks that actually help you respond to incidents effectively. We'll cover two popular frameworks, from SANS and NIST, expert cybersecurity organizations.

SANS 6 Step Incident Response Framework

The SANS Institute, a renowned organization in the field of cybersecurity, has outlined a comprehensive six-phase incident response life cycle, which provides a structured approach to handling cybersecurity incidents. These phases are designed to be repeated for each incident that occurs to continually improve an organization's incident response capabilities — and their overall security posture and readiness to respond to future threats.

Who should choose the SANS incident response framework?

The SANS incident response framework takes a highly practical approach to incident response. It’s great for organizations that focus attention on hands-on training and real-world scenarios. It also provides courses that include theoretical and practical training to deal with real-life incidents.

SANS is suitable for organizations that want their teams to follow a structured and repeatable process for incident response. The framework has clear, actionable steps that help guide even less experienced teams through an incident life cycle. It's also beneficial for organizations that want to stay ahead of emerging threats and require the latest data.

Here's an in-depth explanation of each phase.

Step 1: Prepare

In the preparation phase, the organization reviews its existing security measures, policies, and procedures to assess their effectiveness. This typically involves conducting a risk assessment to identify vulnerabilities and prioritize critical assets.

The findings from the risk assessment inform the development or refinement of incident response plans, including:

Communication plans
The assignment of roles and responsibilities for the incident response team

This phase is about enhancing the organization's readiness to respond to incidents and ensuring that high-priority assets are adequately protected.

Step 2: Identify incidents

During this phase, security teams use the tools and procedures established in the preparation phase to detect and identify suspicious or malicious activity within the organization's network and systems.

When an incident is detected, the response team works to understand:

The nature of the attack
Its source
The attacker's objectives

This phase also involves protecting and preserving any evidence related to the incident for further analysis and potential legal action. Communication plans are initiated to inform stakeholders, authorities, legal counsel, and users about the incident.

Step 3: Contain attackers and incident activity

Once an incident is confirmed, the focus shifts to containment, with the goal of limiting the damage caused by the attack. Quick containment minimizes the attacker's ability to cause further harm. Containment is usually carried out in two phases:

Short-term containment isolates immediate threats.
Long-term containment applies additional access controls to unaffected systems.

For example, this may involve segmenting off the compromised network area or taking infected servers offline while rerouting traffic to failover systems.

(Related reading: redundancy vs. resiliency.)

Step 4: Eradicate attackers and re-entry options

In this phase, the incident response team gains a comprehensive understanding of the extent of the attack and identifies all affected systems and resources. The focus is on ejecting attackers from the network and eliminating malware from compromised systems. This phase continues until all traces of the attack are removed.

Depending on the severity of the incident, some systems may need to be taken offline and replaced with clean, patched versions during the recovery phase.

Step 5: Recover from incidents, restore systems

During the recovery phase, the incident response team brings updated or replacement systems online. The goal is to return systems to normal operation. Ideally, data and systems can be restored without data loss, but in some cases, it may be necessary to recover from the last clean backup.

The recovery phase also includes monitoring systems to ensure that attackers do not return or re-exploit vulnerabilities.

Step 6: Document lessons learned & apply feedback to the next round

The final phase involves a comprehensive review of the incident response process. Team members evaluate what worked well and what didn't, and they identify areas for improvement.

Lessons learned, along with feedback and suggestions, are documented to inform the next round of preparation. Any incomplete documentation is wrapped up during this phase. This phase is essential for continuous improvement in incident response capabilities.

NIST 4 phases incident response framework

In addition to SANS, the NIST framework for incident response is another popular approach. The NIST incident response cycle consists of four key phases (full PDF here), each with specific goals and roles in the incident response process.

Who should choose the NIST incident response framework?

NIST's incident response framework benefits organizations that need a more flexible high-level blueprint for incident response. It aligns with the regulations of various industries such as healthcare, finance, or government agencies. Therefore, it’s beneficial for organizations that focus on compliance.

NIST’s adaptable framework makes it easier to integrate into a wide range of security strategies and is also better for organizations focusing on long-term risk management alongside incident response.

Now, let's look at each step.

(Image source.)

Phase 1: Preparation

The preparation phase focuses on getting the organization ready to respond to cybersecurity incidents effectively. It includes:

Establishing an incident response policy, team, and communication plan.
Implementing preventative measures to reduce the risk of incidents.

In this phase, the organization assesses its risk environment, applies security best practices to systems and networks, secures the network perimeter, deploys anti-malware tools, and provides training to users. It involves creating an environment where the incident response team can quickly mobilize and coordinate their efforts when needed.

Phase 2: Detection and analysis

This phase involves identifying the type of threat an organization is facing and determining whether it constitutes an incident. It includes detecting and analyzing signs of potential incidents: indicators of compromise and indicators of attack.

During detection and analysis, the organization looks for precursors (indicators of future incidents) and indicators (evidence that an incident may be occurring or has already occurred). To detect and identify anomalies, use techniques such as:

Log analysis
Monitoring
Synchronization of system clocks

Incidents are documented and prioritized, and this information is then used to respond effectively.

Phase 3: Containment, eradication, and recovery

The bulk of active incident response takes place in this phase. The primary objectives are to contain the threat, eradicate it, and recover affected systems to resume normal operations. Containment strategies are defined based on the type of attack and the potential damage. Incident response teams work to:

Isolate the threat.
Identify the attacking host.
Gather evidence.
Understand the attack’s behavior.

Eradication involves removing malware and compromised accounts. The recovery phase focuses on restoring systems from clean backups, implementing security patches, and improving defenses.

Phase 4: Post-incident activity

This often-overlooked phase is crucial for learning from the incident and improving future incident response efforts. It includes conducting an incident review or "lessons learned" meeting, preserving data and evidence, and revisiting preparation for future cybersecurity threats.

In the post-incident phase, the organization conducts a thorough review of the incident, documenting key findings and strategies for improvement. Data collected during the incident is preserved, and the incident response team assesses its performance against established baselines and metrics. The findings and lessons learned can inform future incident response and prevention efforts. Additionally, organizations are encouraged to share their insights with other entities to enhance collective cybersecurity knowledge.

Incident response solutions and technologies

Commonly used incident response technologies encompass a range of tools and solutions that play crucial roles in identifying, analyzing, and mitigating security incidents. Some of these technologies and solutions are detailed below.

SIEM: Security Information and Event Management

SIEM systems serve as centralized platforms for aggregating and correlating security event data. SIEMs put together data from various internal security tools, including firewalls, vulnerability scanners, and threat intelligence feeds.

SIEM helps incident response teams sift through the vast volume of notifications generated by these tools, enabling them to focus on indicators of actual threats and reduce "alert fatigue."

(Learn about our industry-leading SIEM, Splunk Enterprise Security.)

SOAR: Security Orchestration, Automation, and Response

SOAR technology empowers security teams to define playbooks, which are structured workflows that coordinate different security operations and tools in response to security incidents. SOAR also facilitates the automation of specific tasks within these workflows, improving efficiency in incident response.

(Learn more: SIEM vs SOAR: What’s The Difference?)

EDR and XDR

Endpoint detection and response (EDR) software is designed to provide automatic protection for an organization's end users, endpoint devices, and IT assets against cyberthreats that can bypass traditional antivirus software and other endpoint security tools. It continuously collects data from all network endpoints, analyzing it in real time to detect known or suspected cyberthreats and respond automatically to prevent or minimize potential damage.

Extended Detection and Response (XDR) is a cybersecurity technology that unifies security tools, data sources, telemetry, and analytics across various parts of the hybrid IT environment, including endpoints, networks, and both private and public clouds. It aims to create a centralized system for threat prevention, detection, and response, helping security teams and Security Operations Centers (SOCs) streamline their efforts by eliminating tool silos and automating responses throughout the entire cyberthreat kill chain.

Though more "modern" than EDR, XDR does have certain limitations, primarily:

The smaller range of integrated solutions they work with.
The data set they can analyze.

These drawbacks can restrict security teams’ ability to use existing or new security solutions of their choice with an XDR platform. Security teams may also encounter blind spots due to XDR solutions’ limited security data coverage, especially when using XDR as the primary security operations platform.

UEBA: User and Entity Behavior Analytics

UEBA leverages behavioral analytics, machine learning algorithms, and automation to identify abnormal and potentially hazardous user and device behavior. It's particularly effective at detecting insider threats, such as malicious insiders or hackers using compromised insider credentials.

UEBA functionality is often integrated into SIEM, EDR, and XDR solutions, enhancing their capabilities in identifying and responding to security incidents.

(Learn more: Splunk User Behavior Analytics.)

ASM: Attack surface management

ASM solutions automate the continuous process of discovering, analyzing, remediating, and monitoring vulnerabilities and potential attack vectors across an organization's entire attack surface. These solutions can uncover previously unmonitored network assets, establish relationships between assets, and provide essential insights to enhance overall security.

These incident response technologies play crucial roles in helping organizations bolster their cybersecurity efforts, detect and respond to threats more effectively, and manage their attack surface to reduce vulnerabilities and potential attack vectors.

Responding to incidents can be easier

Incident response is essential for organizations to protect themselves from the ever-present and evolving threats in the digital landscape. It helps organizations safeguard their data, minimize damage, maintain trust, and meet legal and regulatory obligations. A well-executed incident response strategy is a cornerstone of modern cybersecurity risk management.

Splunk is helping enterprise organizations around the world build their digital resilience. Explore Splunk's industry-leading products and solutions for cybersecurity, monitoring and observability, and data management.

See an error or have a suggestion? Please let us know by emailing ssg-blogs@splunk.com.

This posting does not necessarily represent Splunk's position, strategies or opinion.

Chrissy Kidd

Chrissy Kidd is a technology writer, editor, and speaker based in Baltimore. The managing editor for Splunk Learn, Chrissy has covered a variety of tech topics, including ITSM & ITOps, software development, sustainable technology, and cybersecurity. Previous work includes BMC Software, Johns Hopkins Bloomberg School of Public Health, and several start-ups. She's particularly interested in how tech intersects with our daily lives.

Stephen Watts

Stephen Watts works in growth marketing at Splunk. Stephen holds a degree in Philosophy from Auburn University and is an MSIS candidate at UC Denver. He contributes to a variety of publications including CIO.com, Search Engine Journal, ITSM.Tools, IT Chronicles, DZone, and CompTIA.

Learn 14 Min Read

What Is Root Cause Analysis? The Complete RCA Guide

In this article, we'll explore how to conduct RCA, its core principles, best practices, and the tools available to facilitate this process.

Learn 8 Min Read

Sustainable Technology in 2025

Sustainable technology doesn’t get enough attention. Find out what sustainable IT means and how companies and individuals can use tech more sustainably.

Learn 2 Min Read

What's Security Monitoring in Cybersecurity?

Do it all! Security Monitoring is the catch-all name for the process of detecting threats and managing security incidents. Get the latest and greatest for security monitoring today.

About Splunk

The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.

Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.

Learn more about Splunk