Here at Splunk we’re passionate about helping our customers get as much value from their data as possible. Recently Lila Fridley has written about how to select the best workflow for applying machine learning and Vinay Sridhar has provided an example of anomaly detection in SMLE. Here we’d like to build on that content by providing some details about the Smart ITSI Insights App for Splunk, which is designed to help IT operations teams gain additional insights from ITSI using machine learning - all without having to be a data scientist!
I often get asked how we can help our customers extract the most value from their IT Service Intelligence (ITSI) deployments, and in this blog series, I wanted to present a number of techniques that have been used to get the most out of ITSI using machine learning.
Most of these techniques are wrapped up as repeatable content in the Smart ITSI Insights app for Splunk. I’d encourage you to check the app out and test the capabilities yourself as you read the blogs linked below.
Many of you will be familiar with the predictive analytics in ITSI, which is described in detail here. While this can be a powerful capability, we often hear from customers who are unsure which algorithm to apply or appear to have unpredictable relationships between the service they want to predict and the KPIs that are used to generate the service health score.
For these reasons, we have been working on a new workflow for generating the predictions in ITSI. This workflow allows users to inspect the service health score and KPI relationships, as well as running statistical analysis to determine if there is a good degree of correlation in the data. This correlation is really important – strongly coupled data makes for a good prediction accuracy!
I will talk through this in more detail in the blog about making smarter predictions in ITSI.
While ITSI has an awesome way of grouping alerts using machine learning using Smart Mode, many customers would like a similar approach that gives them more flexibility in how to define an episode. Currently, Smart Mode defines not just the patterns in the data, but the episode aggregation policies too.
Graph analytics is something we have been talking about with increasing frequency at Splunk, and for ITSI it presents a great way of creating ‘smart’ episodes through the use of unsupervised community detection. We talk about this more in the Smarter ITSI Episodes Powered by Community Detection Algorithms blog.
ITSI has some awesome ways of understanding root cause through episode reviews, deep dive analysis and even the service analyser. More recently we have been doing some work around causal inference – a technique to identify causal relationships between data points – and in the blog on Smarter Root Cause Analysis: Determining Causality from your ITSI KPIs we outline how you can use causal inference to identify root cause from your KPIs.
The final topic I will be covering in this series is around how to spot unusual activity in your environment.
Alerts and episodes are great for identifying known patterns of behaviour, such as poor network latency or a hard drive filling up, but they can often struggle with flagging truly unusual patterns of alerts that are generated across the environment. In the final blog post (Smarter Noise Reduction in ITSI) we will be walking through how you can identify unusual event storms through anomaly detection and text analysis.
Hopefully you will be able to gain some additional insight from your ITSI deployment using the Smart ITSI Insights app for Splunk and some of the content in this blog series. Keep an eye out for future blogs detailing how you can use SMLE to further improve some of the techniques we’ve outlined here.
For now it’s over to you to keep your IT systems ticking over smoothly with machine learning!
Happy Splunking!
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.