How Observability Helps Technology Leaders Stay Ahead of the Pack

By Splunk

By 2022, Gartner estimates that more than 3 out of 4 global organizations will be running containerized applications in production. With this comes a new set of monitoring challenges — ephemeral, short-lived infrastructure, complex service interdependencies and on-call developers who now need access to data for fast troubleshooting, just to name a few. As a result, Gartner estimates that a third of enterprises that are implementing distributed system architectures would have adopted observability tools by 2024, so as to improve on their digital business service performance. Several Splunk product and engineering leaders recently sat down with theCUBE to explore just why observability is so critical to business success in today’s digital world.

The Ever-Increasing Complexity of Today’s Infrastructure and Applications

“With 100's of services running on top of 100,000's of containers, the complexity of your environment has grown quite quickly. The fact that those containers may go away as you scale the service up and down to meet demand just adds to that complexity.”
— Patrick Lin, Vice President of Product Management at Spunk

As companies migrate workloads to the cloud and modernize their legacy applications while re-architecting for cloud native environments, they need a single, tightly integrated toolchain that can monitor, troubleshoot, investigate and respond to system behaviors. From an observability perspective, it’s about doing a few things right. Lin continues:

“You need to be tracking this in enough detail and at a high enough resolution in real time, but just as important is understanding the dependencies and the relationships between these different services.”

It’s why we have been busy working to help you understand these dependencies, so that when there's an issue you will know where that problem is coming from. With Splunk Observability Cloud, you can now understand the health and performance of your distributed applications, the impact of your network behaviour and underlying infrastructure. We do so with a common data model that serves as a single source of truth that’s designed to scale. With a purpose-built, real-time streaming architecture that keeps pace with change, this is supported by a consistent and coherent set of integrated workflows that eliminates management complexity.

Implementing Observability the Right Way

But not all observability is created equal, as Arijit Mukherji, Distinguished Architect at Splunk, can attest to as he examines Splunk Observability Cloud under the hood. Mukherji notes:

“If you're going to do a lot of sampling, you're going to miss a huge percentage of user interactions. That's probably a recipe for some kind of trouble down the line."

This is why Splunk’s approach to NoSample™, full fidelity data ingestion to understand every single transaction end-to-end without any gaps is extremely powerful. We are able to capture and correlate what’s happening on front end browsers to backend applications and infrastructure that detect and alert on critical patterns in seconds using AI-assisted analytics.

Mukherji continues by pointing out that a strategy of buying-building individual pieces (i.e. tool sprawl) will not get you very far given the inherent complexities in dealing with modern IT environments. It is this commitment in bringing best-in-class products unified under a single, integrated Observability portfolio that we have extended our foundational acquisition of SignalFx and Omnition with new additions to the Splunk family. Rounding out our capabilities with Digital Experience Monitoring through Rigor and Network Performance Monitoring with Flowmill, we continue to build on our leadership in the APM market with byte-code instrumentation and Java profiling capabilities from Plumbr.

End-to-End Observability to Understand Every User Journey

In the past, an application used to run off three different tiers in a data center. Today, that same application now runs on hundreds of machines across opaque data centers all over the world. It's becoming much more complex in terms of moving parts. The only time you often see how things come together is on the user's desktop. This is where you need Observability to offer a broader context to see everything that's going on inside your application, by starting from the digital experience of your end users and working back.

“When done right, it can tell you the actual end result of all this technology that you're piecing together, of what's getting delivered to your users both quantitatively and qualitatively.”
— Craig Hyde, Senior Director of Product Management at Splunk

Using customer experiences as a yardstick, Hydes describes that the metrics that you pull will be the most useful and ubiquitous in helping provide developer teams much needed visibility of how to optimize their application performance and availability for better end users experiences.

Revealing Hidden Network Dependencies

While cloud-native technologies offer tremendous flexibility and speed for application initiatives, troubleshooting for application issues is often exacerbated by the transient nature of distributed systems that causes other unrelated alerts to happen. SREs need a complete real time view of the interactions between the thousands of microservices written by developers without changing their code or for operations teams having to slow down their production systems. It’s here that we tap into network performance and reliability as a new source of data that’s now available to help with the broader observability challenge.

In this same discussion with technology leaders, Mike Cohen, Head of Product Management, Network Monitoring at Splunk, explains:

“By taking advantage of technology like eBPF and monitoring from the OS layer, we can actually get visibility into how processes and containers communicate to give us insights into our system.”

It’s by better understanding the scope of the problem that will help SREs decide the right mitigation to better hit their SLAs and drive cost optimization. Cohen iterates, “I won't have to hit something with a huge hammer when a really small one might solve the problem.” Thereby turning the network from a liability to a strength in your distributed environments.

Learn More

From our conversations with theCUBE, it was clear that companies need to better learn and understand their complex systems, to improve on insights that help find root causes and reduce MTTRs.

To learn more about how the right observability strategy can improve cross-team collaboration, cost management, and overall business performance, download the ‘Beginner's Guide to Observability’ or contact us for more information.

Ready to take your first steps for real-time streaming, full-fidelity ingest and AI-driven insights across all your data, at any scale? Watch this demo to see it in action, then start your free trial of Splunk Observability Cloud today.

----------------------------------------------------
Thanks!
Collin Chau

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.