In the first part of this 2-part-series we talked about recent additions to version 3.5 of the Deep Learning Toolkit for Splunk (DLTK). Here in part 2 we want to explain a few new algorithmic approaches available for time series analysis. These can be especially interesting for anomaly detection and time series prediction.
When you analyze time series, there are many possible ways to generate meaningful insights depending on your goal. In some cases your goal may not be 100% clear yet, so it might make sense for you to examine more broadly a given time series. Calculating matrix profiles can yield interesting insights for a variety of data mining tasks including discovery of patterns and motifs or anomalies in terms of novelties or discords in a time series. This is where the STUMPY python library comes in handy since it comes with a wealth of options. Let’s have a look at an example:
In the screenshot above you can see a time series of logons over the course of a few weeks of data. You can clearly see a cyclical pattern that repeats over weekdays and weekends. Now we use STUMPY to calculate the matrix profile which is shown as a green line below the logons chart. High values in the matrix profile can be interpreted as novelties in the time series. This means that within a given window the pattern of the time series is rarely seen across all of the specified time series data. For example, if we filter on an upper 90 percentile we can easily retrieve an anomaly signal which you can see as spikes in the orange line at the bottom of the chart. This can tell us where our time series is more different than usual. With this information we can further correlate other data sources or use this signal in an anomaly detector or even combine it with other existing anomaly detection approaches such as the DensityFunction in MLTK. The key takeaway here is that you learned something new and possibly automatically spotted interesting patterns in your time series.
Sometimes everything runs smoothly and sometimes changes happen. As a human analyst you often investigate your data and build charts to better understand where such changes occur and then likely derive rules that you use for alerting. Let’s imagine you work in network operations and collect data that measures round trip times for different connections of your services. With all being normal those measurements are typically stable around a certain value. But what if sudden changes occur and you want to get actively notified? This is where change point detection can be a useful approach to let a model learn dynamically from your data where such changes occur.
The example above shows logs of a simple ping to the DNS service 1.1.1.1 with the given round trip time measurements. As you can see from the data and the chart, the time values are typically around 14ms. Sometimes changes occur and the round trip time spikes up exceeding 200ms. In this use case we switched to different VPN connections which caused a different routing of our ping requests and therefore resulted in different packet round trip times. This certainly doesn’t matter for our simple sample use case, but for many business critical applications it is worth being notified or deriving some KPIs that can be used for further alerting or reporting. Splunk’s IT Service Intelligence (ITSI) combines such measurements to describe the overall health score of IT or business related services. The green bars in the dashboard above show the output of a bayesian online change point detection algorithm as described in this paper and luckily is available as a python library which was integrated into the golden image, so it is ready to use in DLTK if you have a similar scenario. There is some good news for you if you want to tackle such a scenario on a massive scale directly on your streaming data: Splunk’s Data Stream Processor also has a built-in function for drift detection available!
As some of you might have seen, we recently have been investigating ways of making the predictive health score functionality easier to use in ITSI. In addition to this we have been thinking about whether there are specific algorithms that could do a better job of predicting future values for the service health score. With that in mind, enter the memory-intensive Long Short-Term Memory (LSTM) network!
Although the example below isn’t ITSI specific it could easily be applied to the service health and KPI data that you can find in ITSI. For this example we are taking some server power consumption data and trying to predict the power consumption based on metrics such as the CPU utilisation, disk accesses and core cycles. Unlike algorithms such as a Random Forest Regressor or Linear Regression our LSTM is able to make a prediction based on a sequence of events. In this use case we have modelled sequences of 5 events to make the prediction. This means that if you have data where trend is a factor it can be modelled much more easily with an LSTM than with a standard regressor, which looks only at a single record to make a prediction.
We hope these new algorithms are useful for you to improve either your existing use cases or build entirely new ones that relate to the provided examples above.
Happy Splunking,
Philipp
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.