Skip to main content
false

Perspectives Home / CTO Circle

Orgs Have 99 Problems, But Root Causes Are #1

Finding and fixing root causes during postmortems is best practice, so why aren’t organizations doing them?

Have repeat issues got you feeling like it’s Groundhog Day? Root cause analysis during postmortems can identify the underlying problem and suggest a fix. Although finding and fixing the root cause is always the right thing to do, according to The Hidden Costs of Downtime, a surprisingly high 54% of technology executives admit they sometimes intentionally leave it unresolved. Meanwhile, only 42% say their organization always performs a postmortem. So, why would an organization not perform one? Let’s delve into why companies may skip this critical step and offer practical solutions so you can more efficiently find the root cause, making your infrastructure more robust and reliable in the process.

 

 

To fix or not to fix?

 

It shouldn’t be a question. Isolating a problem within a complex hybrid environment of cloud-based and legacy systems can be arduous. So, organizations may avoid fixing the problem because they don’t want to shine a light on the technical debt of their legacy systems (possibly because they already have plans to deprecate the application responsible for the outage).


When it comes to postmortems as a whole, some downtime is over so fast that, for many organizations, it just isn't worth investigating. This is especially true because postmortems can be very labor-intensive without the proper solutions in place. However, postmortems make an entire infrastructure more robust and digitally resilient, preventing the same issues from happening in the future. 

 

 

Don’t let history repeat itself

 

A proactive approach to downtime starts by learning from past mistakes. If your organization struggles with isolating a root cause during postmortems, investing in observability tooling and integrating data from across your environment into one centralized location will make the process much easier. By bringing monitoring for observability to more devices, organizations inherently gain more visibility into their security, helping to quickly pinpoint an incident’s root cause. Unifying data platforms matters: Eliminating data silos and promptly accessing data across tools allows organizations to perform thorough postmortems that prevent repeat issues. 


AI- and ML-driven solutions are now essential for pattern recognition, while predictive analytics powered by AI can stop downtime-causing issues in their tracks. In fact, according to The Hidden Costs of Downtime, resilience leaders — organizations that suffer less overall downtime and financial impact from it — have a higher AI adoption rate than non-leaders to help with this preventative approach. On average, resilience leaders expand their use of discrete generative AI tools at five times the rate of non-leaders and at four times the rate for generative AI features embedded in existing tools. 

 

 

Turn a downtime disaster into a blueprint for success.

 

Performing postmortems not only uncovers underlying issues that could lead to more downtime in the future but it also fosters a culture of continuous improvement and accountability, a key pillar of digital resilience. By rooting out the root cause of an incident, organizations can implement targeted strategies to prevent recurrence, enhancing overall system reliability and promoting collaboration across teams. 


So, why would an organization not perform root cause analysis during a postmortem when they create a path to innovation and digital resilience? Many security, application, and infrastructure issues can lead to downtime. Don’t keep repeat, preventable causes on that list.

 


Read Splunk’s The Hidden Costs of Downtime report to learn about downtime’s most common causes and more recommendations for championing a more resilient business. 

Read more Perspectives by Splunk

April 8, 2024 • 3 minute read

With Observability and AI, If Data Is the New Oil, What Is Its Pipeline?


As with oil, data is informational energy that must be found, extracted, refined, and transported to the location of consumption. Here's how it's done.

May 21, 2024  •  22 Minute Listen

Is Your Organization in Step with AI? Check on Your Data Tenancy.


Forget the lone-wolf mentality of a single SOC. Today, it’s all about cross-sector collaboration and information sharing.

MAY 15, 2024 • 4 minute read

The Makings of a Successful Organization in 2027 and Beyond


How do organizations future-proof tech against threats, both known and novel? Splunk’s SVP and GM of products and technology weighs in.

Get more perspectives from security, IT and engineering leaders delivered straight to your inbox.