Perspectives Home / CTO STACK

Building a Leading Observability Practice Starts With Culture

Observability leaders are aware of issues 2.8x faster than their peers — but getting there often requires a culture shift.

By Patrick Lin, SVP of Observability

January 9, 2025, 6 minute read

There’s a big difference between good and great. Taking an observability practice from a standard to a leading observability practice can eliminate late-night debugging, war rooms with hordes of people, and calls from angry customers.

Our annual report State of Observability 2024: Charting a Course to Success, identified an elite group of leaders who are pulling ahead of their peers. A leading observability practice experiences more successful launches, develops and ships code faster, and fixes issues before they become catastrophic. But, innovation and resilience aren’t mutually exclusive; rather, they are completely intertwined.

How does an organization go from good (or just okay) to great? It’s all in the details.

What defines an observability leader?

State of Observability 2024 respondents were categorized into levels of maturity (beginner, emerging, evolving, and leader) based on 20 self-reported data points. Respondents were more likely to achieve leader status when they adopted several best practices, including having visibility across all their stacks, the ability to proactively identify issues, and the ability to detect issues with context.

Our data revealed several outcomes that leaders experience at a higher level than their peers:

Develop better code, faster. With customer expectations at an all-time high, fast product development is crucial — and developers are the power behind that competitive differentiation. With a strong observability foundation, developers can focus less on fighting fires and more on what they’re good at: writing quality code. Over three-quarters (76%) of leading organizations push the majority of their code on demand — 2.6x more than beginning organizations.

Platform engineering is one route that advanced observability practices are taking to achieve these outcomes. Over three-quarters (78%) employ this discipline extensively within their organizations, and nearly half (48%) say that it contributes to higher developer productivity. Platform engineers do the work that makes it simpler for development teams to get the visibility they need — for example, standardizing the mechanisms and naming conventions used for data collection. This enables software engineers to focus on what they do best — creating reliable, secure software.

Find and fix issues before they cause damage. Leading organizations also have the advantage of speed when it comes to resilience, with 68% saying they’re aware of application problems within minutes or seconds of an outage or slowed performance — 2.8x the rate of beginning organizations.

A focus on improving alert confidence can reduce conjecture and useless fire drills, ultimately helping teams resolve incidents faster. Leaders report upwards of 80% alert confidence, compared to only 54% for beginners. Knowing with certainty that an alert is tied to a real issue can inspire teams to act fast, rather than raising suspicion or ignoring alerts altogether.

Harness the power of AI. AI can help observability teams make sense of their data and turn noise into insights. Leading organizations aren’t just adopting AI more (they use AIOps in their toolsets over 10x more than beginners), they’re reaping more benefits from it. Forty percent of leaders say the ROI of their AIOps tools far exceeded their expectations, compared to only 6% of beginners who say the same.

AIOps adoption is most successful when organizations focus on specific use cases and goals that an AI-powered observability platform can assist with. The value of AIOps lies in its ability to apply analytics to large volumes of observability data to drastically improve anomaly detection, alert noise reduction, root cause analysis, automation, and incident prevention. Leaders recognize this; they are most often using AIOps to determine root cause and remediate incidents with greater intelligence (65%) and consolidate data from multiple monitoring systems (60%).

Get better ROI from observability. Leaders’ successes go straight to the bottom line. They squeeze more value out of their overall observability investments, receiving an annual return that’s 2.67x their spend. They report greater benefits from their observability solutions, too; 95% say that observability improves detection time, and 92% say it helps to speed up app development time.

It’s no surprise, then, that leaders are more confident in their observability solutions to achieve powerful outcomes. For example, 67% of leading teams are extremely confident that they can minimize customer impact during incidents.

How to achieve an observability-first mindset

Observability isn’t something you have; it’s something you do. A leading observability practice isn’t built overnight, and it can’t be achieved by technology alone. Consider these practical strategies to make observability excellence a reality.

Focus on building a customer-first culture of excellence. A leading observability practice is the result of an intentional culture that’s developed thoughtfully over time. A leading organization cares about observability for the sake of creating excellent digital experiences, not simply to avoid the bad ones. They build a culture of not accepting things as they are, but pushing for what they could be. Then, they tap into their desire to educate themselves about strategies to achieve observability continually and then put knowledge into action through tools, training, and processes.

This means being obsessed with delivering excellent digital experiences to customers. Leaders embed a customer-first mindset into every decision. They measure their team’s performance, not only through key DORA metrics but broader KPIs like site performance and net promoter score (NPS), and weave them all together for a richer understanding of customer experience.

Delighted customers are just the tip of the iceberg. Leading organizations also prioritize developer experience by enabling them to experiment and innovate. They lead by example by incorporating innovative technologies and industry standards like generative AI and OpenTelemetry.

Let your engineering team experiment. Successful engineering teams are infused with creativity, passion, and experimentation — and leading organizations give these types of teams the space to thrive. Developers at leading organizations spend about 38% more time on innovation, versus maintenance and other rote tasks.

If your organization is stuck in firefighting mode, cutting down on “emergencies” should be the top focus. Prioritize strategic planning and approaches that improve operational efficiencies, like platform engineering. Once these initiatives free up developers’ time and cut down on drudgery, leaders can encourage software engineering teams to think creatively and flex their problem-solving skills. Hackathons, for example, can inspire engineers to turn their creative side projects into real product features.

Take a ‘sharing is caring’ approach to data. Engineering and security teams will always have their own priorities, goals, and needs. When it comes to the data they need to access to do their jobs, however, there’s a massive overlap. Sharing data can provide richer context to both teams, enabling them to more quickly determine whether the symptoms they are observing during an incident are the result of a bad code push, a bad hard drive, or a bad actor. Leading organizations recognize this and put it into practice; 73% improved MTTR when they brought observability and security tools and workflows together.

If your organization doesn’t already share data across these teams, taking an incremental approach will prevent any potential tension and get both teams comfortable. First, find common data sources and workflows that both teams rely on. Start by converging a few high-priority, low-sensitivity data sources and fine-tuning the workflows surrounding them.

Control your telemetry pipeline. As the volume of telemetry data keeps rising, data management strategies are becoming increasingly vital for ensuring that the value of the data collected, processed, and stored is proportional to its value. Leaders are more aware of this than their counterparts and are more likely to agree that strategies like data tiering and aggregation (the process of tuning the granularity or frequency of collected data to be what's needed, and not more) are critical ways to control costs. And beyond costs, these approaches can give organizations more control over how much data they emit, where they send it, and how they send it — enabling them to squeeze more value out of its data and derive meaningful insights from it.

Fostering a culture that practices and continually improves on its approach to observability is the foundation for becoming a leader — an organization that can develop better code, faster and resolves problems earlier. And while doing takes time and effort, a cultural shift will reap benefits both within observability and throughout the business.

Read the full report to learn more about how leading observability practices are building resilience and incorporating innovation. Explore other strategies that leaders lean on, like platform engineering, generative AI, and OpenTelemetry.

Building a Leading Observability Practice Starts With Culture

What defines an observability leader?

How to achieve an observability-first mindset

Get more perspectives from security, IT and engineering leaders delivered straight to your inbox.