As organizations face the imminent threat of an IT service outage or cyberattack, they often fail to step back and understand how well they've planned to deal with the crisis. According to recent research, we know that:
Perhaps the most regrettable part of it all? Almost half (45%) of these organizations already acknowledge the inadequacy of their disaster recovery capabilities. So, in this article, we'll discuss a framework and steps for creating a disaster recovery plan that sets you up for actual recovery, so you can stay resilient over the long-term.
Disaster recovery planning is less and less about investing in cybersecurity solutions and multiple layers of cloud and on-site data center resources (though that supports your overall business resilience plan). Instead, planning for disaster recovery is more about communication, governance, organizational structure, and a culture of dealing with the crisis.
How do you ensure business continuity amid persistent threat of disasters — which may come from an external cyberattack, IT service outage, natural disaster, or disgruntled internal employee with access to sensitive business information?
A disaster recovery planning strategy builds against these risks as a subset to the Business Continuity Plan (BCP) in these focused stages of a disaster:
(Know the differences: business continuity vs. business resilience.)
All businesses should have a disaster recovery plan in place, especially for the persistent threat of disasters that can occur at any moment. Below are some key benefits of sound disaster recovery planning.
Often, though, the implementation of a disaster recovery plan is highly complex. Here are some of the major challenges businesses face.
Underestimating disasters. Most organizations seem to live by the mantra that such disasters will not happen to them. For that very reason, preparation, if done at all, is done abysmally. Hardware failures, network outages, and cyberattacks are real possibilities that may strike any organization at any moment. Recovery takes time when these businesses are caught unaware by such situations.
Budget constraints. Disaster recovery planning can be very costly. Most small and medium businesses do not have enough budget to invest in backup systems or cloud storage. As the benefits of such investments are not visible directly, many firms delay the vital preparations that may cost them much more in the future.
Complicated IT systems. Business of the modern era depends on so many types of software and systems. All these systems are linked together, complicating the process of recovery in case of a disaster. All these things need to work together and be recoverable, requiring experience, time, and planning.
Evolving cybersecurity threats. Ransomware and other cyberattacks keep increasing in frequency. Disaster recovery plans in many businesses are outdated. These are those that are designed toward catastrophes like fires and floods but do not provide for modern threats. This is often one of the challenges businesses faces (evolving cybersecurity threats) when adopting a sound disaster recovery plan.
Inadequate testing. A disaster recovery plan works only when it's being regularly tested. Many companies forget to do this and hence have a plan that is either old or just won't work as expected in case something actually goes wrong. Testing the effectiveness of the plan is a way of ensuring this does not happen.
Skills shortage. Many organizations lack the human resources in terms of skilled IT personnel to effectively manage disaster recovery. Without the right personnel, it's hard to be assured that everything will go right during an actual disaster. Training employees is also very crucial so that they may know precisely what to do at the time a crisis strikes.
Balancing speed and accuracy. When disaster strikes, businesses want to get back online as soon as possible. Speed can lead to mistakes, though, and these mistakes may become bigger problems later. Companies must temper the urge for speed with doing things right to avoid other issues.
Disaster recovery planning has gone through significant changes over the decades — this is as a result of technological advancements, changing business needs, and lessons learned from catastrophic events.
1970s. During the 1970s, the development of digital technologies marked the beginning of disaster recovery planning. Many businesses at this time transitioned from paper records to digital storage, and they became increasingly dependent on IT infrastructure, as it provided better safety for their paper records that were prone to disasters like water, fire, and theft. Disaster planning firms emerged offering hot, cold, and warm sites to mitigate technological failures. (More on this topic below.)
1980s. Regulations made it mandatory for banks (and, later, other industries) to have a backup plan to ensure their computers and data were safe. These laws helped create a whole industry around disaster recovery plans.
1990s. In the 1990s, following the development of a three-tier architecture —where users interacted with a technology’s user interface (i.e., presentation tier), data processing and calculations tier (i.e., application tier), and data storage/management tier — there was a remarkable shift toward more efficient disaster recovery plans.
2000s. During the 2000s, server virtualization changed disaster recovery plans in a positive way. During this period, there were much faster recovery times and improved redundancy.
2010s. In the 2010s, with the advancement of cloud computing, businesses had to start paying vendors to handle their disaster recovery plans, thereby making them flexible, scalable, and affordable. This brought the concept of Disaster Recovery as a Service (DRaaS).
Today. In this decade, disaster recovery plans focus on being proactive and using risk-based approaches. Companies now integrate AI-powered tools, continuous monitoring, and testing to predict and prevent disasters. Even though cyber threats or disasters are evolving and unpredictable, businesses with strong disaster recovery plans are more than ready for it.
With that context, let's now turn to the practical side of creating and using your disaster recovery plan.
How do you plan for disaster recovery? Disaster recovery planning is about three key activities:
The goal of disaster recovery planning is to reduce business disruption when the underlying resources—computing, applications, and data—are rendered unavailable. (That could be due to an unforeseen threat, or an inevitability that you can only prepare for so much.)
A robust disaster recovery planning process ensures that cost-effective and practical measures are developed in anticipation of these threats, allowing the organization to recover from disasters that may take them by surprise.
(Understanding incident severity levels can help risk prioritization.)
Below are a few important steps that you can follow for your disaster recovery planning.
The first step of an effective disaster recovery plan is to obtain strong support from all stakeholders, especially for resource investments and allocation. Disaster recovery requires investments in technology resources and activities that don't offer an immediate ROI but are critical to reducing the opportunity cost of a downtime incident. While management is responsible for implementing and executing a disaster recovery plan, its effectiveness depends on the resource allocation — which requires approval from business decision-makers and top management.
Establish a dedicated team that will oversee the planning, development, and execution of a disaster recovery plan. This team can comprise cross-functional team members, across multiple levels of the organizational hierarchy. The goal of a planning committee is to:
Quantify the business impact of a downtime incident that impacts different workloads and operational activities. Create a risk profile that depends on the cost of downtime as well as the probability of the threat impact, threat resilience, alternatives, opportunity cost of downtime, and its role in disrupting other dependent operational activities and services.
Evaluate the cost of disaster recovery for each item, and prioritize disaster recovery objectives for the most impactful operational activities and services. Some of the important metrics to consider are:
(Related reading: risk appetite vs. risk tolerance: what's the difference?)
RTO and RPO illustrated on a timeline, before and after a disaster occurs. (Original image source.)
Your disaster recovery plan can focus on a variety of recovery strategies based on the risk profile and business value. These strategies can include backup in a few areas:
If the applicable data and application backups are stored in the cloud, you may choose from a variety of storage tiers that give different levels of recovery performance and service level agreement (SLA) guarantees at different price points.
In order to develop a practical disaster recovery plan, incentivize disaster recovery activities across all business functions and hierarchical levels. Understand their needs; identify their limitations, especially those pertaining to risk mitigation and recovery; develop a governance and reporting mechanism that makes it easy to communicate and collaborate on threat risks, threat incidents, and disaster recovery activities where and when needed.
Some of the key starting points in this regard could be a strong focus on eliminating silos between teams, hierarchical levels, and business functions, as well as automating the reporting and collaboration process.
Orchestration technology is very important for an effective disaster recovery. It helps to streamline and automate the recovery process during a disaster by ensuring seamless communication between teams, tools, and systems, with the aim of minimizing downtime. Orchestration tools integrate with existing infrastructure to automate key functions like:
Ultimately, orchestration technology enables businesses to quickly get over evolving threats, ensuring continuous operations and minimizing the impact of disasters. With this technology, organizations are able to bounce back faster, reduce errors, and maintain business continuity.
Disaster recovery plans are very important for minimizing downtime and ensuring business continuity in the face of unforeseen disasters or cyberattacks. By proactively addressing risks, investing in scalable solutions, and testing recovery processing regularly, businesses can significantly reduce operational disruptions in the face of disaster. A properly set up disaster recovery plan not only protects vital data and systems but also helps organizations recover quickly and maintain trust with stakeholders.
See an error or have a suggestion? Please let us know by emailing ssg-blogs@splunk.com.
This posting does not necessarily represent Splunk's position, strategies or opinion.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.