Platform

October 05, 2020

3 Minute Read

MLTK Smart Workflows

By Greg Ainslie-Malik

I’m excited to announce the launch of a new series of apps on Splunkbase: MLTK Smart Workflows. These apps are domain-specific workflows, built around specific use cases, that can be used to help you develop a set of machine learning models with your data. In this blog post, I’d like to take you through the process we adopted for developing the workflows.

MLTK Smart Workflows Built on top of the Splunk Machine Learning Toolkit (MLTK), Smart Workflows are designed to easily guide you through the process of developing a machine learning model with a few click-and-select options - regardless of your prior experience with model building. All of the details of the techniques being used in the Smart Workflows, as well as additional sources of information, will be available to users if they are interested in learning more about data science.

We are releasing two Smart Workflows apps as part of the initial launch: the Smart Ticket Insights app for Splunk and the Smart Education Insights app for Splunk. You can read more about these apps in their respective blog posts, which describe the use cases they are solving and how to use the app.

What Is A Workflow Anyway?

I hope that many of you will be familiar with the Experiment Assistants and Smart Assistants that exist in the MLTK today. Smart Workflows are similar to Smart Assistants in that they provide guided assistance, which helps users generate analyses and insight from their data. Users get help building models in a step-by-step process so they can still benefit from machine learning without being formally trained in data science.

This type of process is often described as automated machine learning (autoML) as it allows you to generate insight from your data by selecting a few options and clicking a few buttons – rather than having to build a model from scratch.

It is worth noting that Smart Workflows perform pre- and post-processing of the data on behalf of the user and can also apply multiple algorithms to the data in a single pipeline. This differs from the smart assistants, which have limited pre-processing options, and the experiments, which rely on users to determine what pre-processing to apply to the data. Neither smart assistants nor experiments allow users to apply multiple algorithms in a single pipeline either.

How Did We Develop Them?

The starting point for building both of the Smart Workflows apps came from customers approaching us with a problem. In both cases, there were a couple of aspects that we identified which made these problems suitable for a Smart Workflow solution:

The problem was well-defined and common across a vertical
Simple data analysis would not be sufficient to deliver the insight users were looking for

Once we were confident that we ticked both of these boxes, we began testing hypotheses that we could prove (or disprove) as we sought to solve these problems.

Without going too much into the details, a good hypothesis should be simple and testable – identifying the subjects, variables and effects that you will be measuring. In our two examples we had the following hypotheses:

We can identify frequently occurring (effects) types of tickets (subject) from ticket management data (variables).
We can predict the outcome (effects) for a student (subject) based on their interactions (variables) with university IT systems.

For those interested, I recommend this cool blog here, that can take you through some of the mistakes to avoid when coming up with a hypothesis.

Exploratory Data Analysis (EDA)

Like everything in Splunk, we really need to start with some data. Critical to any data science project is having an understanding of the data that you are going to be using and the features (or fields) and their relationships. There is comprehensive documentation on how to prepare your data for machine learning here, and I’d also recommend the Analytics and Data Science course we run for some helpful tips and techniques, too!

Exploratory Data Analysis

Hypothesis Testing

Once you understand a bit about your data, then you can apply some methods and analytics to the data to see if you can prove your theory. This is fairly easy to achieve in the MLTK experiments, which will also provide you with the SPL needed for the model development. You may want to refer to this blog about techniques for pre-processing your data, which contains lots of useful hints about preparing data for processing by a model-generating algorithm.

Hypothesis Testing

Productionisation

Provided you are able to prove your hypothesis, it’s then a case of considering what the process would look like for making the analytics repeatable for other users or data sources. Often this entails sourcing the data, moving to EDA, then pre-processing, then model development and testing before finishing with something that allows you to manage the models. (Disclaimer: you may need little bits of JavaScript to help users navigate between stages and manage some of the more complex searches…)

Productionisation

What To Look Out For

We will be announcing more details about these smart workflows and other enhancements we are making to ML at Splunk at .conf20 this year, so be sure to join us for that! Also, stay tuned for further smart workflows to be released, and read on to find out more details about the Smart Ticket Insights app for Splunk and the Smart Education Insights app for Splunk.

Schedule Windows vs. Skewing

The difference between schedule windows and the new skewing feature.

Platform 2 Min Read

Unlock the Value of Cloud: Introducing Splunk Cloud Value Calculator

Splunker Deepak Belani explains how our new Splunk Cloud Value Calculator can calculate how much admin effort you can save by using Splunk Cloud Platform.

Platform 3 Min Read

Driving Data Innovation With MLTK v5.3

Read this blog to learn about the new release of Machine Learning Tool Kit (MLTK) v5.3 and how to get started find anomalies in your data!

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.

Learn more about Splunk

MLTK Smart Workflows

What Is A Workflow Anyway?

How Did We Develop Them?

Exploratory Data Analysis (EDA)

Hypothesis Testing

Productionisation

What To Look Out For

Related Articles

Schedule Windows vs. Skewing

Unlock the Value of Cloud: Introducing Splunk Cloud Value Calculator

Driving Data Innovation With MLTK v5.3

About Splunk

Subscribe to our blog

Connect with Splunk on X

Connect with Splunk on Instagram