Splunk Observability Cloud is a full-stack observability solution, combining purpose-built systems for application, infrastructure and end-user monitoring, pulled together by a common data model, in a unified interface. This provides essential end-to-end visibility across complex tech stacks and various data types, such as metrics, events, logs, and traces (MELT), as well as end-user sessions, database queries, stack traces and more. However, the sheer volume and variety of data can make pinpointing and resolving issues a daunting task, often relying heavily on individual expertise and familiarity with tools. This is where the AI Assistant comes into play, providing a conversational interface to surface insights and streamline investigation and exploration across the entire environment.
At its core, the AI Assistant allows users to interact with Observability data and compose workflows (e.g., troubleshooting, exploration) using natural language. This addresses a wide range of observability activities and use cases: inspecting and analyzing health of a Kubernetes cluster, identifying sources of latency in a complex service topology, finding span attributes associated with errors, pinpointing root cause or surfacing patterns among logs, and so on.
Figure 1. The user interface of the AI Assistant in Observability Cloud
Splunk has a rich history of applying advances in language modeling to enhance offerings across observability and security. Recently, we have announced fundamental enhancements to the Splunk AI Assistant for SPL (Search Processing Language), which provides a natural language interface for constructing and understanding SPL, the language for expressing queries in the Splunk platform. The AI Assistant in Observability Cloud (the “AI Assistant”) represents our continued investment in this area. (The AI Assistant is currently available to select private preview participants upon Splunk's prior approval.)
The ability of large language models (LLMs) to produce impressive answers and analysis in the observability domain inspired us to bring the background knowledge and reasoning capabilities of modern LLMs into our products. This blog post will discuss our high-level technical approach, some of the challenges we faced in adapting the approach to our domain, and some general ideas on where things might be headed.
Our approach follows the agent paradigm, wherein a generally capable LLM is augmented with access to various tools. The main conversational thread is governed by an orchestration agent, powered by an LLM with the key capabilities of understanding a user’s intent, planning, calling the right tools, and reasoning with tool responses. In the context of Splunk Observability Cloud, this orchestrator can understand a user's request; formulate a plan (sequence of tool invocations); route requests to the specific microservices provided by Splunk Application Performance Monitoring (APM), Splunk Infrastructure Monitoring, Splunk Log Observer Connect, and so on; and synthesize the tool responses in order to answer the initial request. The LLM is provided with a list of tools with descriptions and signatures. For a complicated task, it can chain multiple tools together to obtain a final answer. For example, if a user asks for “the root cause of the high error rate in payment service”, the orchestrator needs to understand the user's intention is to troubleshoot a certain service. By planning, the LLM knows it needs to first search the service names to find one like “payment”, call APM APIs for an error breakdown, and then extract information from the breakdown to identify a possible root cause.
Designing efficient and safe system prompts is crucial for guiding the orchestration agent to understand the context and constraints for a user’s query. The system prompt includes:
The orchestrator is equipped with a short-term memory to retain contextual information, enabling it to make appropriate decisions. This memory encompasses the user's current query, the current conversation between the user and the agent, system prompts, and tool descriptions. This memory is short-term as it pertains to the current conversation only. The memory capacity is determined primarily by the context length of the orchestrator's LLM.
Having provided the agent a general purpose and guidelines, we then conceive of various observability data and platform elements as tools.
In this preview release, the agent’s tools cover the following areas:
Each tool comes with a carefully designed description and parameters. The orchestrator is responsible for tool selection and extracting parameters from the context (e.g., “the past hour” is mapped to a time range object [“-1h”, “now”]).
Some of the tools encapsulate multi-step workflows and are themselves backed by LLMs; we call these specialized agents sub-LLMs since they typically compute on a subset of the context available to the orchestration agent. These focus on handling specific tasks, such as SignalFlow generation, chart creation, root cause analysis, and so on. The specialists retain the ability to invoke other tools in the same manner as typical function calls.
Figure 2. The architecture of the AI Assistant in Observability Cloud
This hybrid strategy yields several advantages:
SignalFlow is the metrics analytics engine at the heart of Splunk Observability Cloud. It is a Python-like language that allows users to transform and analyze incoming streaming data, and write custom charts and detectors. Although SignalFlow is a powerful computational tool, like SPL it has a steep learning curve. We designed a specialized sub-LLM for SignalFlow generation that can generate programs from the user’s natural language queries and task descriptions. For example, if a user asks for “the average cpu utilization”, the agent will generate a SignalFlow program like:
data('cpu.utilization').mean().publish()
We utilized lessons learned from developing the AI Assistant for SPL, and found that chain-of-thought prompting and retrieval augmented generation greatly enhance the sub-LLM’s ability to generate correct programs of moderate complexity, comparable to intermediate SignalFlow users. For the science and engineering details, please refer to our companion blog, "Generative AI for Metrics in Observability."
In addition to task decomposition, tool selection, and query generation, we needed to understand LLM capabilities in processing various data types (metrics, events, logs, and traces). Our general experience is as follows.
As more tools are incorporated into an agent, the difficulty of selecting the right tool increases. It leads to higher error rates when addressing complicated tasks involving chaining multiple tools to arrive at final answers. For example, a typical troubleshooting journey for a service incident requires a multi-step workflow, such as:
environment name -> service name -> service topology -> service errors -> logs search
Ideally, the agent should be able to follow this workflow and call the functions correctly in sequence. However, on some occasions, the agent may fail to do so, for example by immediately using service topology without getting service/environment names.
Workflow-based Optimization
For complicated tasks, we optimize the orchestration agent by instructing it to follow some typical workflows. This optimization includes three steps:
With such workflow based optimization, we can improve the performance on tasks that require complicated tool use, as the workflow instructions introduce extra domain knowledge for the agent to address these tasks.
In many cases of tool use, the agent needs to extract the right search terms from user queries to search for certain information in systems that are typically keyword-based. For example, for the question of “What is the average disk usage?”, the agent should retrieve the metric “disk.utilization”. Ideal search terms would be “disk” and “utilization”, but the agent usually extracts “disk” and “usage” as search terms, so that “disk.utilization” may not be a top hit.
We alleviate this issue by expanding the queries with the knowledge of LLMs. Specifically, we include a list of synonyms for commonly used terms in the observability domain, and expand the search terms using the synonym list. For the above example, the possible search terms can be expanded to “disk”, “usage”, and “utilization”, increasing the recall of the search tool. The result is that keyword-based search systems behave more like semantic search systems.
There are two primary challenges for our agent evaluation:
We developed a trajectory-based approach for both data collection and metric definition. The main idea is that, for a given query, we first run the agent for multiple trials, and then aggregate the same trajectories of tool use and also collect the final responses. During the ground truth collecting phase, we collect all trajectories and responses from multiple trials, and manually select the correct trajectories and responses. For the evaluation phase, we develop two metrics: trajectory match for tool use, and embedding-based similarity for final responses. The following figure shows how we match the trajectory: when the order of tool use is fully matched with the ground truth, it is regarded as a correct run. To measure the final response, we use the cosine similarity between the embeddings of the agent's responses and the ground truth response.
Figure 3. Overview of the trajectory-based evaluation method
We assembled a test set of questions that are representative of the questions that our users are likely to ask, and we verified the assistant’s answers to the questions by identifying the correct answer in the Splunk Observability Cloud product. With many iterations of prompt engineering over system prompts, tool description, and workflows, we observe consistent improvement, and the AI Assistant achieves the designed business requirements with state-of-the-art performance.
Broadly speaking, we plan to explore how to extend our AI Assistant in multiple ways:
In short, we improve existing skills, we develop new skills, and we find new ways of putting skills together.
As a whole, we find developing LLM applications, especially compared to traditional software development, to be exhilarating: carefully-crafted additions to the system prompt can enable what are essentially new features (e.g., novel SignalFlow constructions yielding new insights, APM investigations that might otherwise require expert product understanding), and implementing a tool can unlock new workflows (e.g., navigations between APM and metrics data). The experience was also frustrating at times: apparently trivial modifications to the system could have surprisingly large effects on certain test scenarios, and certain stubborn hallucinations required hard-coded patches. In addition to standard software testing practices, we developed evaluation practices (eventually part of pipelines) with an eye towards helping us make product decisions.
Deploying the AI Assistant internally at Splunk has yielded a wealth of insights regarding user expectations and has helped to focus our research and development efforts. Several of our internal engineering teams are already using the AI Assistant in a range of use cases, from helping new team members better understand their systems, to in-depth analysis of issues in production. We are eager to see how users continue to interact with our AI Assistant and what use cases they hope to address, to further expand the AI Assistant’s skills and workflows.
Co-authors and Contributors:
Joseph Ross is a Senior Principal Applied Scientist at Splunk working on applications of AI to problems in observability. He holds a PhD in mathematics from Columbia University.
Om Rajyaguru is an Applied Scientist at Splunk working primarily on designing, fine-tuning, and evaluating multi-agent LLM systems, along with working on time series clustering problems. He received his B.S. in Applied Mathematics and Statistics in June 2022, where his research focused on multimodal learning and low-rank approximation methods for deep neural networks.
Liang Gou is a Director of AI at Splunk working on GenAI initiatives focused on observability and enterprise applications. He received his Ph.D. in Information Science from Penn State University.
Kristal Curtis is a Principal Software Engineer at Splunk working on a mix of engineering and AI science projects, all with the goal of integrating AI into our products so they are easier to use and provide more powerful insights about users’ data and systems. Prior to joining Splunk, Kristal received her Ph.D. in Computer Science from UC Berkeley, where she studied with David Patterson and Armando Fox in the RAD & AMP Labs.
Akshay Mallipeddi is a Senior Applied Scientist at Splunk. His principal area of focus is to augment AI Assistant in Observability Cloud by formulating strategies to improve data integration aspects, critical for the large language models. He is also involved in fine tuning large language models. He did his M.S. in Computer Science from Stony Brook University, New York.
Harsh Vashishta is a Senior Applied Scientist at Splunk working on the AI Assistant in Observability Cloud. He did M.S. in Computer Science from University of Maryland, Baltimore County.
Christopher Lekas is a Principal Software Engineer at Splunk and quality owner for the AI Assistant in Observability Cloud. He holds a B.A. in computer science and economics from Swarthmore College.
Amin Moshgabadi is a Senior Principal Software Engineer at Splunk. He holds a B.A.S. from Simon Fraser University.
Akila Balasubramanian is a Principal Software Engineer and the technical owner of the AI Assistant in Splunk Observability Cloud. She is very passionate about building products that help monitor the health and performance of applications at scale. She is a huge believer in dogfooding products and closely collaborating with customers to get direct, candid feedback. She enjoys leveraging her analytical and creative skills to solve problems and deeply values quality over anything else. She holds a Masters degree in Computer Science from the University of Illinois.
Sahinaz Safari is a Director of Product Management, and the head of AI in Observability. She has a long track record of building and scaling innovative products based on cutting-edge technologies in the Observability domain. Sahinaz has a MS in Electrical Engineering from Stanford University and a MBA from UC Berkeley.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.