Welcome to the third blog of the Observability Center of Excellence (O11y CoE) series! If you’ve been following along, we’ve discussed the why behind an O11y CoE, and we explored how to assemble and structure the team to make it a reality. Now, we’re ready to dive deeper into one of the CoE’s critical functions: defining and measuring Observability as a Service (OaaS).
In the context of an Observability CoE, OaaS is the operating model for delivering observability capabilities to the organization. Much like other "as a Service" models, OaaS focuses on providing observability as a scalable, measurable, and value-driven practice that supports teams across the business. To determine its effectiveness, it must be instrumented—just like the systems it aims to monitor.
Is your observability practice positioned to help teams resolve incidents faster, reduce downtime, and optimize performance? Defining some base KPIs early in your journey not only helps the CoE answer these questions but also enables it to leverage data to understand what’s working (and what’s not). These KPIs provide visibility into the CoE’s value, empowering it to continuously refine and improve its delivery of observability services.
In this blog, we’ll explore:
By the end, you’ll have the tools and insights to ensure your Observability CoE is delivering measurable value through OaaS, setting the stage for future enhancements like maturity assessments and tactical implementations.
A fellow Splunker created a great article on KPIs, OKRs, and metrics, breaking down their distinctions and how they complement each other. The gist is simple: KPIs monitor ongoing performance and historical trends, while OKRs align teams around strategic goals and measurable outcomes.
KPIs–Key Performance Indicators—are like the operational pulse of your observability practice. They answer questions like, “What’s happening right now?” and “What trends have emerged over time?” These indicators provide a near-time and historical view into the health of your OaaS, helping you identify trends, measure effectiveness, and take action.
OKRs–Objectives and Key Results—are about where you want to go. They combine a clear objective (the goal) with measurable results to ensure progress. While KPIs tell you what’s happening, OKRs drive strategic alignment and improvements.
Imagine your Observability CoE tracks a KPI called Agent Saturation, which measures the percentage of available resources instrumented with observability agents. This KPI shows how comprehensively your environment is covered.
In this case, the KPI provides the current state and historical context, while the OKR establishes the target state and timeframe for improvement. Together, they ensure the CoE can monitor progress while driving a strategic outcome.
KPIs and OKRs complement each other by ensuring your OaaS practice is operationally effective and strategically aligned:
Together, they create a feedback loop: KPIs inform how close you are to achieving OKRs, while OKRs ensure you’re focusing on initiatives that deliver meaningful value.By distinguishing between KPIs and OKRs, your Observability CoE can build a framework that tracks progress, measures success, and aligns with organizational goals. In the next section, we’ll explore what makes a good KPI and common pitfalls to avoid.
Any service offering thrives on actionable, meaningful, and relevant KPIs that provide insights into what’s working—and what isn’t. A well-chosen KPI doesn’t just measure performance; it also drives continuous service improvement and supports broader objectives, such as enabling the Observability CoE (O11y CoE) to achieve its OKRs.
For those looking for a deep dive into the nuances of good vs. bad KPIs, I recommend checking out this Splunk article on KPI management. It explores how to identify impactful KPIs, avoid common mistakes, and set up management frameworks.
Defining KPIs is as much about knowing what to avoid as it is about selecting the right metrics. Some common pitfalls include:
The Observability CoE is central to ensuring success with both KPIs and OKRs. By defining actionable KPIs early and aligning them with clear OKRs, the CoE can:
Defining KPIs isn't just about tracking progress; it's about laying the foundation for a successful Observability-as-a-Service (OaaS) model. By explicitly integrating OKRs, your O11y CoE gains the ability to continuously adapt, refine, and enhance its value proposition. This alignment ensures that observability practices drive iterative and constant value updates to the business, keeping the organization responsive and competitive.
When identifying KPIs for your Observability CoE, it’s useful to group them into categories based on their focus and purpose. To quickly recap, OaaS KPIs should help assess whether your OaaS operating model is effectively delivering, or is positioned to deliver, observability capabilities to the organization. Organizing KPIs into these categories ensures your measurements are actionable and aligned with the outcomes your Observability as a Service (OaaS) practice strives to achieve.
Later in this blog, I’ll provide specific examples of O11y KPIs, including their descriptions, purposes, calculations, potential data sources, and which category they fall under. For now, let’s explore the core KPI categories:
Focus: Ensuring observability tools and platforms are operational and accessible. This type of KPI tracks the reliability of your observability ecosystem, helping you answer questions like:
Focus: Monitoring the deployment and use of observability tools and resources. Utilization KPIs measure things like license usage, tool versioning, and deployment coverage, ensuring you’re getting the most out of your investments. Key questions include:
Focus: Measuring engagement with observability tools and practices across teams and environments. Adoption KPIs cover two key dimensions:
Focus: Enhancing efficiency and reducing noise. Optimization KPIs evaluate how well your observability practice reduces unnecessary alerts, improves workflows, and minimizes manual effort. These KPIs tackle questions like:
By organizing KPIs into these types, you can align your measurements with the strategic goals of your CoE and your organization. In the next section, we’ll take a look at some specific examples of OaaS KPIs, explaining their purpose, how to calculate them, and some practical “pro-tips” based on my experience.
Click here to expand
Now that you’ve explored the critical role KPIs play in defining and measuring Observability as a Service (OaaS), it’s time to put these ideas into action. Here's your call to action:
If you’re passionate about learning more about observability, I’d encourage you to check out my teammates Observability content on Splunk’s community blog and watch some of our latest videos on YouTube (Splunk Observability for Engineers).