Splunk is committed to using inclusive and unbiased language. This blog post might contain terminology that we no longer use. For more information on our updated terminology and our stance on biased language, please visit our blog post. We appreciate your understanding as we work towards making our community more inclusive for everyone.
The latest release of the Splunk Machine Learning Toolkit (MLTK) enables users to upload their pre-trained models in MLTK with a simple UI. Once the model is in Splunk, users can use the model with their Splunk data with no modification to their existing workflows. This capability extends the usability of MLTK and ML-SPL beyond models trained using MLTK, unlocking a huge use case of using external models with data inside Splunk. MLTK 5.4.0 is available in GA for both Splunk Cloud Platform and Splunk Enterprise customers.
MLTK is an easy way for Splunk customers to get started with machine learning. The app provides Showcases and Assistants that guide the user through a series of steps to train, assess and operationalize ML models. The app provides backend ML components with a frontend app experience, abstracting away the complexity of data science notebooks and actual code. MLTK empowers users to leverage machine learning in their Splunk workflows using ML-SPL commands - fit for training ML models, and apply for running inference. MLTK is very popular among Splunk customers and serves very important machine learning use cases, such as anomaly detection, forecasting and clustering. It is one of the most downloaded apps on Splunkbase with over 185K downloads. Additionally, the fit and apply commands that are bundled with MLTK are used millions of times every month by our customers.
While customer demand for ML has grown rapidly, many Splunk customers have not been able to incorporate ML as a part of their Splunk user journeys. Most MLTK customers want to bring new algorithms or pre-trained models into MLTK. As per our telemetry data, 80% of algorithms run in Splunk are customized. However, users find it very challenging to create ML models and ship them to use in Splunk with their Splunk data. To use their external models in Splunk, users need to convert the models to MLTK supported codec format and import custom Python scripts with root permissions, which can be a very time-consuming and tedious task. This is a huge pain point for our customers who regularly ask for a better way to solve this issue.
MLTK 5.4 solves the above-mentioned challenges with the option to upload externally-trained ONNX models for inferencing in MLTK. Users can train their models in their preferred third-party environments, save the models in ONNX format, upload the models to MLTK and inference them in MLTK with their Splunk data. This way, users can offload the process-heavy model training outside the Splunk platform but still benefit from the operationalization within the Splunk platform using their Splunk data.
The uploaded model goes through a series of validation steps including validation that the user has the required model upload capabilities, and that the model is the correct file format. After validation and verification, Splunk’s REST API is used to store the model in an MLTK accessible location within Splunk. Users can then use the model with their Splunk data in the same way they would use a model created with MLTK, which is a workflow that users are already familiar with. This way, users can focus on the important task of creating and training their ML models and offload the complexity of bringing their model into Splunk on MLTK.
Prior to this release, users were limited to using the ML algorithms and libraries packaged with MLTK. With the 5.4.0 release, however, MLTK supports inferencing ONNX models. ONNX stands for Open Neural Network Exchange and is a common format for machine learning models. This format lets you create models using a variety of machine learning frameworks, tools, runtimes, and compilers. Thus, users can now take advantage of a wider range of libraries for training their models, such as TensorFlow, PyTorch, Keras, Matlab, among others.
The UI for uploading an external model is simple and intuitive. It requires a few parameters from the user that are helpful for verification of the model file and are also used during model inference.
Running model inference is the same workflow that MLTK and ML-SPL users are already very familiar with. There is one minor difference - the model name after the apply command needs to have the onnx: prefix. This tells MLTK and the apply command that the model being used for inference is an ONNX model.
The MLTK team is very mindful of security concerns and has taken steps to ensure that only users with the appropriate permission to upload model files to Splunk can upload such files. By default, the ability to upload models will be disabled for all users. Splunk admin will need to grant special permission to users to be able to upload model files to Splunk.
In addition to the pre-trained ONNX model capabilities, MLTK 5.4.0 also extends the anomaly detection capabilities available to users with the addition of a new algorithm for multivariate outlier detection. Users can now provide a multivariate dataset as input to the new MultivariateOutlierDetection algorithm which performs a series of steps internally to return outliers in this dataset.
MLTK 5.4.0 is available today on Splunkbase for use with Splunk Cloud Platform as well as with Splunk Enterprise. For more information on how to use this feature, refer to the MLTK documentation. To get started with this new version today, visit Splunkbase.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.