By 2022, Gartner estimates that more than 3 out of 4 global organizations will be running containerized applications in production. With this comes a new set of monitoring challenges — ephemeral, short-lived infrastructure, complex service interdependencies and on-call developers who now need access to data for fast troubleshooting, just to name a few. As a result, Gartner estimates that a third of enterprises that are implementing distributed system architectures would have adopted observability tools by 2024, so as to improve on their digital business service performance. Several Splunk product and engineering leaders recently sat down with theCUBE to explore just why observability is so critical to business success in today’s digital world.
“With 100's of services running on top of 100,000's of containers, the complexity of your environment has grown quite quickly. The fact that those containers may go away as you scale the service up and down to meet demand just adds to that complexity.”
— Patrick Lin, Vice President of Product Management at Spunk
As companies migrate workloads to the cloud and modernize their legacy applications while re-architecting for cloud native environments, they need a single, tightly integrated toolchain that can monitor, troubleshoot, investigate and respond to system behaviors. From an observability perspective, it’s about doing a few things right. Lin continues:
“You need to be tracking this in enough detail and at a high enough resolution in real time, but just as important is understanding the dependencies and the relationships between these different services.”
It’s why we have been busy working to help you understand these dependencies, so that when there's an issue you will know where that problem is coming from. With Splunk Observability Cloud, you can now understand the health and performance of your distributed applications, the impact of your network behaviour and underlying infrastructure. We do so with a common data model that serves as a single source of truth that’s designed to scale. With a purpose-built, real-time streaming architecture that keeps pace with change, this is supported by a consistent and coherent set of integrated workflows that eliminates management complexity.
But not all observability is created equal, as Arijit Mukherji, Distinguished Architect at Splunk, can attest to as he examines Splunk Observability Cloud under the hood. Mukherji notes:
“If you're going to do a lot of sampling, you're going to miss a huge percentage of user interactions. That's probably a recipe for some kind of trouble down the line."
This is why Splunk’s approach to NoSample™, full fidelity data ingestion to understand every single transaction end-to-end without any gaps is extremely powerful. We are able to capture and correlate what’s happening on front end browsers to backend applications and infrastructure that detect and alert on critical patterns in seconds using AI-assisted analytics.
Mukherji continues by pointing out that a strategy of buying-building individual pieces (i.e. tool sprawl) will not get you very far given the inherent complexities in dealing with modern IT environments. It is this commitment in bringing best-in-class products unified under a single, integrated Observability portfolio that we have extended our foundational acquisition of SignalFx and Omnition with new additions to the Splunk family. Rounding out our capabilities with Digital Experience Monitoring through Rigor and Network Performance Monitoring with Flowmill, we continue to build on our leadership in the APM market with byte-code instrumentation and Java profiling capabilities from Plumbr.
In the past, an application used to run off three different tiers in a data center. Today, that same application now runs on hundreds of machines across opaque data centers all over the world. It's becoming much more complex in terms of moving parts. The only time you often see how things come together is on the user's desktop. This is where you need Observability to offer a broader context to see everything that's going on inside your application, by starting from the digital experience of your end users and working back.
“When done right, it can tell you the actual end result of all this technology that you're piecing together, of what's getting delivered to your users both quantitatively and qualitatively.”
— Craig Hyde, Senior Director of Product Management at Splunk
Using customer experiences as a yardstick, Hydes describes that the metrics that you pull will be the most useful and ubiquitous in helping provide developer teams much needed visibility of how to optimize their application performance and availability for better end users experiences.
While cloud-native technologies offer tremendous flexibility and speed for application initiatives, troubleshooting for application issues is often exacerbated by the transient nature of distributed systems that causes other unrelated alerts to happen. SREs need a complete real time view of the interactions between the thousands of microservices written by developers without changing their code or for operations teams having to slow down their production systems. It’s here that we tap into network performance and reliability as a new source of data that’s now available to help with the broader observability challenge.
In this same discussion with technology leaders, Mike Cohen, Head of Product Management, Network Monitoring at Splunk, explains:
“By taking advantage of technology like eBPF and monitoring from the OS layer, we can actually get visibility into how processes and containers communicate to give us insights into our system.”
It’s by better understanding the scope of the problem that will help SREs decide the right mitigation to better hit their SLAs and drive cost optimization. Cohen iterates, “I won't have to hit something with a huge hammer when a really small one might solve the problem.” Thereby turning the network from a liability to a strength in your distributed environments.
From our conversations with theCUBE, it was clear that companies need to better learn and understand their complex systems, to improve on insights that help find root causes and reduce MTTRs.
To learn more about how the right observability strategy can improve cross-team collaboration, cost management, and overall business performance, download the ‘Beginner's Guide to Observability’ or contact us for more information.
Ready to take your first steps for real-time streaming, full-fidelity ingest and AI-driven insights across all your data, at any scale? Watch this demo to see it in action, then start your free trial of Splunk Observability Cloud today.
----------------------------------------------------
Thanks!
Collin Chau
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.