Most classical, batch-oriented machine learning systems follow the paradigm of “fit and apply”. In an earlier blog post, I discussed a few patterns on how to better organize data pipelines and machine learning workflows in Splunk. In this blog, we’ll review how you can organize your machine learning model in a new way: online learning.
The difference between batch learning and online learning systems is that in the first approach, you attempt to learn from a whole dataset at once and in the latter, you take incremental steps and constantly update your model “online”. In practical applications, there are pros and cons for each, making it hard to decide which approach is more suitable.
The main advantage of an online learning system is the typically lower compute and memory footprint because you don’t have to process a large dataset as is the case in traditional batch learning. Although it could potentially be costly to perform batch data processing and take time to train the model, you can continuously feed smaller batches of data incrementally to the online learner and get faster responses. The system learns from the batches and memorizes the important characteristics in its model representation while continuing to apply them to make inferences about the data presented. Additionally, as soon as new data points arrive, the model can adapt to new situations, and therefore keep learning.
With all those advantages, please keep in mind that there are also challenges that you need to consider - the model should be able to handle concept drift, which can occur when data changes significantly. Additionally, if you only have the online model, but no longer the historical data, it is difficult to meaningfully retrain the model if something goes wrong in your data or the online algorithm of choice. In production-grade systems, you ideally have strategies in place to deal with such situations, especially if you rely on an online learning system for business-critical applications. Nevertheless, this approach is still a viable tool in your belt to consider for your use case.
Since version 3.8 the Splunk App for Science and Deep Learning (DSDL), formerly known as the Deep Learning Toolkit (DLTK) allows you to tap into online learning algorithms powered by the River Python library with a dedicated container image and an example for an online learning anomaly detector based on the HalfSpaceTrees algorithm, an online variant of isolation forests. They work well when anomalies are spread out.
In the screenshot above, you can see a simple time series of the access count to a Recruiting Service, represented by the blue bars. In the line chart overlay, you can see the green line indicating an anomaly score, which is calculated by the online learning model. On the left side of the chart, you’ll notice that the score appears after a certain defined warm-up phase which is quite typical for online learners. If you follow the green line even more closely, you can also see how, after a while, the learner adjusts from an average value of 0.40 to a lower value stabilizing around 0.25 on the right end of the chart. Finally, the orange line indicates the flagged anomalies based on a threshold that can be easily adjusted based on the desired sensitivity of the detector. That’s how the 11 anomalies are automatically spotted and could now very easily be used for alerting purposes or more sophisticated correlation searches.
To conclude this online learning example, let’s look at what a practical workflow in Splunk would look like. Typically you would take on the following steps to get your online learning system up and running in DSDL:
I hope this blog post provides you with a novel approach on some of your machine learning challenges. Please note that not all algorithms are equally suited for online learning purposes, so you should carefully evaluate use cases and compare possible online learning approaches with other traditional batch learning approaches to make an informed decision on what is a better fit.
If you are looking to learn more about the Splunk App for Data Science and Deep Learning, you can watch this .conf session to explore how BMW Group is using DSDL for a predictive testing strategy in automotive manufacturing. In case you are interested in how to use DSDL to scale out forecasting with prophet, stay tuned for another blog post coming soon.
Happy online learning,
Philipp
Many thanks to Judith Silverberg-Rajna, Katia Arteaga and Mina Wu for your support in editing and publishing this blog post.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.