When a critical service or an application goes down, it’s all hands-on-deck for Site Reliability Engineers (SREs) and Developers to restore as soon as possible. However, the folks responsible for triaging and ultimately solving the issue, often spend (#waste) a lot of time context-switching between numerous tools like a war-room (Slack) channel, updating JIRA or ServiceNow tickets, taking screenshots of Dashboards or errors seen on other tools team members don’t have access to, referring to a runbook on confluence and parallely investigating the incident in a siloed way.
Resolving incidents in this haphazard fashion comes at its own cost. SREs and Developers lose sight of the issue they are trying to solve without a structured approach and they are unable to build a comprehensive post-mortem or retrospective report to learn from.
If you take a step back, it’s clear to see that telling a data-driven story is challenging for everyone. Data Scientists have been struggling with this for years as noted in a recent Forrester survey that reports 99% of companies surveyed think data science is an important skill to develop, but only 22% of those companies have seen actual business value. This disconnect can be attributed to the challenges in communicating and delivering real data insights from data science work.
Jupyter Notebooks to the rescue!
Jupyter Notebooks rose to prominence in academia because they were able to help alleviate the challenge of communicating data insights. With a UI that combines live code, visualizations, narrative text and other media, the notebook interface allows you to look at artifacts independently from the rest of the noise, and determine what their outputs are – enabling you to test a hypothesis quickly, and communicate those results clearly to other team members.
When Splunk engineers started looking at ways to deconstruct an incident response, this UI proved invaluable, and by combining this capability with the ease of collaboration in a google doc, SRE’s and Developers have a powerful solution to launch an investigation into a production outage.
Simply sign-in to access our new free Beta offering, Splunk Investigate, to get a closer look at how you can easily investigate multiple data sources with our powerful, collaborative interface.
----------------------------------------------------
Thanks!
Cody Bunce
The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.
Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.