Wouldn’t it be great to peek into the future and find answers to the problems that you’re facing today? This may sound like science fiction, but many companies currently possess this capability, and they are creating strategies around it to strengthen their monitoring and analytical capabilities.
One way is time series forecasting, a statistical method. You can take advantage of the insights of time series forecasting by using techniques like anomaly detection to gain:
Time series forecasting is a way to forecast or predict behaviors based on historical, timestamped data.
For example, take a look at the time series data below. The data is charted out and binned with 1 hour granularity. It shows a cyclical pattern as traffic begins ramping up at 8PM, hits its peak at midnight, and then dwindles down entirely by 9AM.
As you can see, the volume of traffic on each day is not identical. For example, Mondays usually have much less traffic than any other day except Tuesday, which appears to be flat. This means that you must account for the hour of the day and the day of the week in order to make a proper forecast.
There are many advantages of using time series forecasting — and the greatest one is the ability to forecast expected behavior at an unknown point in the future.
For example, one of your teams may be tasked with budgeting for storage needed for the next six months. This may not seem too hard if you’re dealing with a handful of servers, but what happens when you have tens of thousands of hosts with multiple volumes?
Another advantage of time series forecasting is the ability to forecast future customer traffic over the coming days and then alert anytime the actual traffic deviates from the expected traffic volumes. This can be used to quickly identify the onset of denial-of-service attacks or a drop in traffic levels which could point to potential problems within your system.
It may be difficult to see a usable pattern in the image above, so let’s overlay the data to bring out a pattern that’s more apparent to the human eye. The timewrap command is a perfect solution for this since it will overlay each day with a 24-hour time period on the x-axis.
Results in:
Time series forecasting will allow you to peek into the future and know what behavior to expect at any point in time. This will enable you to:
(Read our anomaly detection introduction.)
Hard thresholds will not work with cyclical data, since the values vary relative to the time of day and day of week, which could lead to:
Instead, you can determine how to construct dynamic limits using the following p Chart formula:
”Introduction to Statistical Quality Control” - Douglas C. Montgomery
In the formula above, the UCL represents the upper control limit while the LCL represents the lower control limit.
You’ll need to establish your standard baseline before you can calculate these limits. The standard baseline can be developed from the output of the timewrap command that we discussed above, which we used to overlay daily values on top of one another.
Predicting future disk usage is another good use case for time series forecasting. You can plot out the disk usage per day and leverage the streamstats command to identify the previous day's value.
If you have enough data points, you will also be able to identify the slope. You can use this to predict the point where the slope will intersect with the total capacity and plot that point on the x-axis. Since the x-axis represents time, this will give you a good idea of how long you have before a server runs out of disk space.
Time series forecasting is not magic; it’s a statistical technique that takes advantage of historical, timestamped data in order to predict how the data likely will behave at some point in the future.
See an error or have a suggestion? Please let us know by emailing ssg-blogs@splunk.com.
This posting does not necessarily represent Splunk's position, strategies or opinion.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.