With the recent release of AI Assistant in Observability Cloud in US and Europe realms came an incredible amount of excitement and frenzied interest. But with great power comes great responsibility! Like with any new shiny object, you might not be sure exactly how to properly utilize the AI Assistant and maximize its potential.
Look no further - this article is for you ! I’m a Splunker working in the Growth Engineering Marketing team and couldn’t resist trying the AI Assistant myself. I’ve been asking it to help me in my day to day tasks and have found 7 use cases that have helped improve my observability tremendously!
A simple yet crucial question that every engineer wants to know the right answer to: how is my instance doing? Now, in Splunk, you can just provide the instance name, and the AI Assistant will deliver a thorough analysis of your instance. You might even feel like it's magic—because it really is! Here's a simple AI-generated response from one of my instances as an example.
After maintaining this instance for several years, I’ve never had such deep insight into it! Now, I can confidently review all the AI-recommended optimizations and enhance my system's reliability.
Each request to the application generates a trace ID. In OpenTelemetry, a "trace ID" is a unique identifier assigned to a specific request or operation being monitored by the system. If you've received an alert from APM about one of your backend calls, you can grab the trace ID and ask the AI Assistant to perform an independent analysis of the issue. Here's an example of an AI-generated response based on one of my trace IDs. After reviewing it, you won’t need to say a word—just go ahead and create a ticket for your development team to address the issue immediately.
![]() | Key Observations
Error Analysis
Recommendations
|
I’ve always valued feedback on my code, service, or feature—how it performs after going live, whether there are significant bottlenecks, and if it’s meeting expectations. This feedback not only justifies the work I’ve done but also highlights areas for improvement, keeping me prepared for necessary changes. It does not need to be back whole.
AI assistance shines in this area, offering valuable insights—and sometimes, a boost to your ego! Here’s the feedback I received from AI after a recent implementation to optimize one of the most heavily used service endpoints. It brought a smile to my face.
I asked APM AI to analyze Production common-service for the past 8 days. It highlighted The GET /api/bin/careers/joblist endpoint was highly utilized with 87,001 requests and no errors, showcasing its stability.
The other day, I asked my friend, “AI”, "Can you analyze the memory utilization of my monolithic application over the last six weeks?" Unsurprisingly, it came back with the observation that there might be a memory leak in the application. The AI Assistant said, "I’m sorry to inform you."
My response? "No need to apologize—I should be grateful!" I conducted a similar analysis six months ago after a production issue and reached the same conclusion. However, my team didn’t believe me, and my findings were overlooked. But now, with AI’s analysis in hand, they can't ignore it. Hopefully, this time the team will take action!
![]() | The analysisBased on the analysis of the memory utilization and total memory over the last 24 hours for the AWS instance with the unique ID i-039e37a2bdd344f04_us-west-1_576561709469, here are the findings:
Considering the steady increase in memory utilization without any significant drops, it suggests that the workload on this instance is gradually consuming more memory. |
We often find ourselves needing to assess whether our current system can handle anticipated traffic—whether it’s a new product launch, a Black Friday surge, or providing assurance to leadership about system capacity. This assessment often involves determining if the system is sufficient or if additional nodes are necessary.
With AI Assistant, you can quickly gain insights into your system’s status and make informed predictions about future requirements. A few weeks ago, I faced a similar challenge. Here’s the analysis provided by the AI—I'll let you be the judge.
![]() | Looking at last month's service performance of wmd-srv-prod , can it handle an extra 1000 requests per hour? Analysis of the service performance for "commons-svc" in the "wmd-svc-prod" environment over the past month indicates the following:
The service has handled a significant number of requests with a relatively low error rate. The current request count per hour is approximately 72 requests (52,015 requests over 720 hours in a month). Can it handle an additional 1000 requests per hour?Given the current performance metrics, the service appears to be stable with a low error rate. However, handling an additional 1000 requests per hour would significantly increase the load. Here are some considerations:
Recommendations:
|
If you’re familiar with observability, you know exactly what I mean by "Unknown Unknowns." With AI Assistant, you’ll encounter many surprising discoveries.
For example:
You'll uncover numerous such "Unknown Unknowns" once you start engaging with AI. The importance of this process can’t be overstated when it comes to making your application more reliable and performant. Ultimately, this means a better experience for your end users , additionally it can translate into real cost saving (i.e de-commissioning ghost servers) as well .
Our system is quite complex (and honestly, which isn't these days?). Bringing a new team member up to speed is a challenge on its own. You provide the code, share documentation links, Zoom recordings, and even give one-on-one demos. Yet, don’t be surprised when follow-up questions come your way, leaving you thinking, "How many times do I need to repeat this?" You’re swamped with sprint work and wish they could just Google it—but they can’t, because it’s our internal application.
This is where the AI Assistant in Splunk Observability Cloud steps in to save the day! It explains the system, breaks down the application, outlines request/response workflows, and maps out the entire transaction flow. As a result, not only does the new person get to know the system inside out, but they might even bring up some interesting insights in meetings—stuff no one else knew, except for AI!
Thanks to Splunk’s AI Assistant, I can stay focused on my tasks while my teammate gains enough knowledge to become productive quickly.
Here is an example on how AI can kick-start your learning:
The dashboard link provided by AI has an in-depth view of the service. As seen below:
We can get details about all the specific end-points.
And, then ask AI to explain the Trace ID to understand the application further.
I recently started using Splunk’s AI Assistant in Observability Cloud, and I’m blown away by what it brings to observability. I’m excited to share my experience with all of you! I hope the common use cases above help you and your team leverage GenAI to deliver an outstanding customer experience.
Why wait? Let’s dive in! If you're already on Splunk Observability Cloud, you can explore the use cases above for yourself. If not, you can learn more and sign up for a free trial here. Use this AI Assistant resources.
Download the AI eBook here and I’ll see you on the other side!
The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.
Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.