Information Technology Operations Analytics (ITOA) is an analytics technology that uses datasets generated by IT systems to improve their efficiency and effectiveness as part of the practice known as IT operations management (ITOM). The primary goal of ITOA is to make IT operations more effective, efficient, faster and more proactive through the use of an organization’s own machine data.
By analyzing the historical, real-time machine data produced by all elements of an organization’s IT infrastructure, IT teams can not only make sure their systems are running at peak performance, but can also predict and prevent outages by observing previous events. In other words, ITOA is the application of big data to solving the many challenges faced by the IT department while helping create better decision-making processes.
Splunk IT Service Intelligence (ITSI) is an AIOps, analytics and IT management solution that helps teams predict incidents before they impact customers.
Using AI and machine learning, ITSI correlates data collected from monitoring sources and delivers a single live view of relevant IT and business services, reducing alert noise and proactively preventing outages.
In many ways, ITOA is an umbrella term that captures a large number of IT operations, including reporting, querying and data analytics. In addition to providing operations analytics, many ITOA solutions are designed to bundle applications programming management (APM) capabilities as well as configuration management tools to support a broad range of business, compliance and resource allocation requirements.
In this article, we’ll discuss how ITOA fits into modern IT operations, how it compares to observability, AIOPs and capacity management, its benefits and challenges and best practices in implementing ITOA in your organization.
IT operations are all the activities executed by the IT department in a company or organization that are designed to maintain and optimize the performance of the technology. This includes everything from the individual workstations used by members of the organization to the overall network infrastructure on which the organization’s systems run. As the complexity of network infrastructure has grown over the years, including the increasing use of microservices and cloud computing, the role, complexity and importance of IT operations has also increased.
Some specific examples of IT operations in the enterprise include:
ITOA is a key element of modern IT operations, and is the logical extension of the data revolution into the practice of organizational IT. Before ITOA, IT operations, other than scheduled maintenance, were almost completely reactive — fixing something when it stopped working. As IT systems became more complex and downtime became increasingly expensive in terms of organizational reputation and downtime penalties, a proactive approach became necessary. By using historical machine data and operational data to predict likely outages and prevent them from happening, ITOA gives IT operations teams an invaluable tool to:
Root cause analysis in IT operations is the practice of using all available data and information pertaining to an issue, event or outage to determine what the core cause of the problem. Before the advent of sophisticated data analysis capabilities, root cause analysis often required a trial-and-error approach, in which each potential source of failure was isolated and investigated. These types of approaches were labor intensive, time consuming and expensive.
Thanks to ITOA’s machine-learning driven analytics capabilities, root cause analysis is now quicker and more effective, attributed to using the system’s own machine data to correlate the event in question with the historical data from similar events. By using machine learning and system data, ITOA tools are able to find the root cause of an issue significantly faster.
Root cause analysis looks at logs and diagnostic data from applications, tracks changes in code, monitors capacity and usage and can be configured by users to monitor for specific key performance indicators (KPIs) they wish to track.
AIOps is the practice of applying analytics, business intelligence and machine learning to big data, including real-time data, to automate and improve IT operations and streamline workflows. AI can automatically analyze massive amounts of network and machine data to find patterns, both to identify the cause of existing problems and to predict and prevent future ones.
The term AIOps was coined by Gartner in 2016. In the Market Guide for AIOps Platforms, Gartner describes AIOps platforms as “software systems that combine big data and artificial intelligence (AI) or machine learning functionality to enhance and partially replace a broad range of IT operations processes and tasks, including availability and performance monitoring, event correlation and analysis, IT service management and automation.”
While ITOA uses data to analyze events, it generally focuses on monitoring collected data to analyze events that occurred in the past. AIOps platforms use artificial intelligence (in the form of machine learning) to not only analyze issues and events that have already occurred, but also to predict future events and prevent them from happening. In that regard, AIOps is generally considered to make more significant and practical use of artificial intelligence than basic ITOA functionality. However, many people consider AIOps to be a further evolution of ITOA, and while you might hear the terms used synonymously, they are, in fact, not interchangeable.
In the same way that ITOA can contain elements of AIOps, the overall function of ITOA and ITOM are increasingly coming under the umbrella of observability.
Observability is the ability to measure the internal states of a system by examining its outputs. A system is considered “observable” if the current state can be estimated by only using information from outputs, namely sensor data. The term originated decades ago with control theory (which is about describing and understanding self-regulating systems). However, it has increasingly been applied to improving the performance of distributed systems. Three types of telemetry data — metrics, logs and traces —allow us to be observable, providing deep visibility into distributed systems and allow teams to get to the root cause of a multitude of issues and improve the system’s performance.
Observability allows teams to monitor modern systems more effectively and helps them to find and connect effects in a complex chain and trace them back to their cause. Further, it gives system administrators, IT operations analysts and developers visibility into their entire architecture.
Observability and ITOA have the same fundamental goals: using data generated by IT systems to improve their efficiency and effectiveness. In common usage, observability defines a philosophy of action and ITOA defines a day-to-day role and practice within the IT organization. The distinctions between the two terms are not clearly defined and continue to evolve. The principles and practices of observability therefore support the ITOA function, but observability is not a replacement for ITOA, nor is ITOA an alternative to observability. One perceived difference could be related to the persona of the user of ITOA or observability tools. It could also be argued that observability is the latest iteration and, in fact, the evolution of the practice known as ITOA.
IT capacity management is the practice of ensuring that an organization’s IT systems and infrastructure are sufficient to the tasks required of them. IT capacity management broadly incorporates three main elements: business capacity management, service capacity management and component capacity management. Capacity management is also used to forecast future needs and justify pricing and expenditure on additional IT equipment, services and personnel to meet them in an effort to make smarter and most cost-effective business decisions.
Capacity management is not a specific function of typical standalone ITOA tools, but many vendors are moving toward more integrated ITOA platforms that combine related functionality that includes capacity management.
ITOA provides IT teams the ability to perform big data capture, indexing, management and search, all of which lend themselves to practical applications in an IT environment.
By using machine learning capabilities to collect and analyze large amounts of data, ITOA can enhance and accelerate IT log management, log search and analysis, and root cause analysis. It can also make performance predictions based on past performance data.
By performing the above-mentioned tasks automatically without requiring the involvement of the IT team, ITOA automates a wide variety of functions and leads to a number of benefits, including:
ITOA makes IT operations more effective, efficient, faster and more proactive through the use of an organization’s own machine data.
ITOA can be challenging to an organization that is used to manual processes and has not attempted to automate core IT functions. Knowing where to begin can create challenges.The term ITOA and related terms are used and combined in a wide variety of ways by end users, analysts and vendors. It can be difficult for an organization to identify what they need to prepare (and what they need to purchase) to implement an ITOA platform.
IT operation analytics solutions and tools are generally sold as complete packages on an operational analytics platform by different vendors. The components within an ITOA framework perform a variety of functions, including:
Implementing ITOA is similar to selecting any type of major application in an organization and starts with your organization’s established request for proposal (RFP) process. A few best practices for implementing these new processes include:
Implementing ITOA follows a similar process to selecting any type of major application.
It’s possible that the future of ITOA is already here, in the form of AIOps. Others would say that the future of ITOA is an evolving component of observability. Currently there isn’t yet a clear distinction among the terms ITOA, AIOps and observability, and they may be used interchangeably or in combination to describe a particular use case, software or hardware implementation. Regardless, both AIOps and observability represent the increased reliance on data, machine learning and artificial intelligence to perform IT analytics and maintain optimum efficiency of IT systems. ITOA and its related disciplines can only grow as the platforms evolve to make better use of machine learning and artificial intelligence. The more data an ITOA platform has available to it, and the more AI capabilities it incorporates, the better it will be able to predict future IT events, prevent issues and outages from occurring and create a better customer experience.
The predictive nature of AI as applied to ITOA also provides the opportunity to use ITOA as a predictive analytics planning tool, predicting the potential business impact of the IT functionalities that it monitors. Teams who are responsible for planning IT infrastructure advancements can use ITOA for capacity planning, to understand the ramifications, both positive and negative, of future growth. The future of ITOA lies squarely in the ability of ITOA vendors to take advantage of AI and data capabilities to provide additional predictive functionality.
ITOA is a discipline and methodology that brings data and analytics to the process of managing an organization’s IT infrastructure. There is no doubt that it is the future of ITOM. Any organization that wants to get the most value from its data, use it to maximize its IT investment and turn the combination into a distinct business and competitive advantage needs to investigate and implement an ITOA plan.
See an error or have a suggestion? Please let us know by emailing ssg-blogs@splunk.com.
This posting does not necessarily represent Splunk's position, strategies or opinion.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.