It’s 3PM on a Friday, and your day is winding down. Suddenly, you get an urgent email from your boss asking you to set up an alert for monitoring volume. You consider this an easy task.
You set a hard threshold for what you think is a low volume based on the last four hours of incoming data. Then, you expand the timeline over several days and see the following:
The blue line represents the actual volume count per fifteen-minute granularity, while the red line is the upper limit and the green line is the lower limit. Clearly, hard thresholding will not work in this scenario because the data has a cyclical pattern.
Luckily, there are ways to get around this cyclical yet predictable data set. You can solve this problem by using the density function.
Short for probability density function, the density function will learn from the past in order to create a model that will map out dynamic thresholding. From this, you can understand what is normal — relative to a certain hour of the day or a certain day of the week.
The density function ships with Splunk Machine Learning ToolKit, and it’s easy to use out of the box.
(Explore all the algorithms in the Machine Learning ToolKit.)
The density function works by looking over a large dataset of time series values.
In the following example, these values represent the number of calls received from customers over a 1-hour time period. The density function will create a model that illustrates the probability that a value will be within a particular range in a particular “bucket” of time. These buckets are defined as the “HourOfDay” and the “DayOfWeek.” If the value falls outside of the expected range, it will be considered an anomaly.
Before you can use the density function, you must fit the density function algorithm against your traffic volume field (i.e. “Actual”) to create the model.
The model acts as a relationship mapper, and it needs to be applied against unseen datasets in order to determine anomalous behavior. You can use a model that has already been built by using the apply command.
The apply command will generate a new field called “IsOutlier(Actual),” which will determine whether the model believes that the “Actual” value is anomalous at that point in time.
The density function offers fast time to value (TTV) when solving complex problems. More importantly, it is easy to use, and most engineers can figure it out without a background in statistics.
The biggest challenge of using the density function is recognizing when you need it – more on this shortly.
The density function has a default limit of 1,024 groups, which prevents you from achieving high time granularity and good results from the split-by entity. This means that your finest time granularity should be no less than 1 hour, with no more than 42 additional entity split-by groups. These limits are adjustable, but if you find yourself constantly bumping them up, this might not be the correct tool for the job.
False alerting can be one of the biggest drawbacks of using the density function to create adaptive thresholds. These false alerts may not be statistically false, but they may be too sensitive for the general business use case, and it may be alerting too much to provide any real value. Luckily, this can be adjusted by changing the sensitivity of your model.
For example, when you apply the model, you will get a new field called “IsOutlier(Actual),” which will have a binary 1/0 value. You can change the sensitivity by modifying the limits that create this value, which will maintain the integrity of the model while reducing the false alerts associated with it.
Adaptive thresholding has a very narrow use case. For one, your data trends need to have a predictable historical pattern that is likely to continue into the future. For example, in our first image, you can see a cyclical pattern in which the volume is different at each hour of a particular day, but when you look at the same hour over the course of weeks, the volume is very similar.
If your dataset doesn’t have a predictable trend, then adaptive thresholding will probably not be a good solution.
Fitting a model (i.e., creating a machine learning model) using the density function can consume a lot of CPU because it needs a lot of data to learn from so that it can identify adaptive thresholds. The accuracy of the model directly correlates with the size of the training set. Therefore, it can be expensive to train models, so you may want to apply this technique only to high-value problems.
The density function is an excellent tool, since it provides fast time to value (TTV) when solving complex problems. You can create a model with it that can be used inside of your alerts to identify abnormal behavior at a particular time of day without relying on hard thresholds that will constantly fire false alerts. As with anything, there are costs associated with using this technique, so choose wisely!
See an error or have a suggestion? Please let us know by emailing ssg-blogs@splunk.com.
This posting does not necessarily represent Splunk's position, strategies or opinion.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.