Algorithms are at the heart of the technologies we use in virtually every facet of our daily lives — formulas and processes that help us connect, solve problems and accomplish amazing things. Things like better speech recognition and landing an autonomous rocket on a drone ship, or giving us really great Netflix recommendations. But an algorithm is just a set of rules or a set of tasks to perform given a certain input. The true potential of algorithms and artificially intelligent systems — systems powered by machine learning — is when they are trained and fed lots and lots of data.
You don’t typically think about training computer systems, particularly ones with advanced capabilities like AIOps platforms. In reality, IT systems only reach their potential if they have patient and thorough trainers to guide them.
It seems unlikely that a system with the capacity to deduplicate a million events, anticipate and predict system availability issues 30 or more minutes in advance and automatically remediate identified issues would require additional training, but algorithms only take these platforms so far. AIOps systems are extremely capable as described, but their strength lies in their ability to learn and adapt. That learning comes from multiple sources.
Providing more data can provide more accurate insights. But this doesn’t mean throwing any old data at the system. Instead, it suggests that the system will become more accurate as you provide it data similar to data that it has seen before. This data will likely originate from similar sources, but the increased diversity will allow the system to apply its algorithms and identify adjustments it may need to make for specific applications or services.
Feedback from IT operators — the trainers — further instructs the platform on how to behave and report in the environment. It is not uncommon for early adopters of AIOps systems to have alternating moments of elation and wonderment at why the system sends alerts on items the team would prefer logged. The reasoning is simple. Feedback from operators is essential for the system to learn what demands immediate attention and what is appropriate to log without reporting. The IT department doesn’t need to know about every failed action, but they do want to know if the rate of failures increases or is only showing up in focused areas of the environment.
Organizations have varying thresholds and service level objectives (SLO). AIOps platforms require training on how to respond when those thresholds are exceeded or SLOs are at risk. Often these thresholds will align to customer satisfaction and other business impacts, with which computers are otherwise uninterested. The system needs to be trained on the right course of action based on the anticipated outcome.
Finally, AIOps delivers the capacity to perform a variety of functions based on the data it has and the trends it identifies. That response can include simply logging the frequency of a scenario, initiating an incident response runbook to engage the right team members to address an issue — or independently remediating the issue using scripts or automation. It’s the role of the IT operators and developers to train the system on the best response based on a variety of scenarios.
Training computer systems shouldn’t seem daunting; we do it all the time. Anyone who uses autocorrect understands that the system may provide some hiccups early on (hopefully nothing too embarrassing — we’ve all seen the screenshots). Autocorrect requires training. It needs to learn the words that you use on a regular basis. It learns your speech patterns and how you typically communicate. Over time it becomes a valuable time saver when writing — or at least saves you from an embarrassing typo.
AIOps platforms, while infinitely more capable and complex, are no different in that they perform best when trained by experienced IT professionals who can apply the needs of the department and business to the outcomes the platform supports.
Want more? 6 Myths of AIOps Debunked will help you cut through the hype and get to the core of what AIOps is, and what it isn’t.
----------------------------------------------------
Thanks!
Josh Atwell
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.