As your tech landscape expands, so does the need for visibility across your ecosystem. With each new service you develop to meet customer needs comes more applications, more servers and more performance data that your teams are generating, storing and analyzing to maintain visibility and reliability. In fact, Statista predicts that organizations’ data volumes will exceed 180 zettabytes by 2025. You could quickly find yourself generating millions of time series metrics (stored in Splunk Observability Cloud as a MTS) just to monitor the latency of web requests to your network of servers.
However, more data isn’t always better. Collecting and processing performance data across your entire ecosystem with a unique metric for each statistic is not always practical or financially feasible– especially for cloud-forward enterprises. Additionally, pre-calculating an aggregate statistic, such as a percentile, and sending that as a gauge metric to your Observability platform could also introduce new issues down the line with data flexibility when dashboarding and troubleshooting.
Enter histograms. As defined by the OpenTelemetry Project, a histogram metric data point conveys a population of recorded measurements in a compressed format. Histograms provide customers with a more flexible way to use performance data in their charts and detectors without increasing costs or obscuring meaningful data. With the recent launch of Explicit Bucket Histograms in Splunk Observability Cloud, users now get native support for histograms as a metric type. They can seamlessly ingest, store and query histograms in the platform to efficiently capture distributions of measurements and perform statistical calculations like percentiles.
Explicit bucket histograms divide MTS data into equally-sized intervals known as buckets based on boundaries which are defined during instrumentation. Each bucket tracks the frequency as well as the sum, maximum and minimum of all the observations within its boundaries, enabling statistical calculations such as percentiles. Instead of a single metric, customers can see the distribution of data points within each bucket to easily identify trends or patterns in their data.
Explicit bucket histograms are useful for performance data, such as request latency or response time. For example, a user tracking latency for server requests using explicit bucket histograms in Splunk Observability Cloud will be able to:
With native support for histograms in Splunk Observability Cloud, engineering teams get the flexibility to build the visualizations and detectors they need to maintain service reliability and efficiently troubleshoot issues while controlling costs.
Without histogram data support, engineering teams may need to run special infrastructure to pre-aggregate percentiles before sending in their performance data. This could prove costly at scale and require additional toil and reinstrumentation if teams later needed different percentiles for visibility or troubleshooting. Natively ingesting histograms means users can save on these instrumentation costs. Furthermore, by using a histogram to represent a population of multiple time series metrics, users can reduce their MTS volume and lower costs.
Pre-computing gauge metrics to represent required percentiles often relied on guesswork to determine which percentiles might be most valuable and limited the types of calculations that could be performed after these metrics were ingested. Users weren’t able to aggregate these histograms, and performing additional calculations on these pre-aggregated percentiles could obscure meaningful insights and lead to inaccurate conclusions.
Histograms provide service owners and SREs with greater flexibility when creating charts and detectors in Splunk Observability Cloud. When defining the query for a new chart or detector, users are empowered with greater analytical capabilities. They maintain the ability to request any percentile, request a percentile across multiple services and request a percentile over a specific period of time. They have the flexibility to aggregate their data for charts and detectors in any way they want to better understand performance data and troubleshoot performance issues.
There are two ways Splunk Observability Cloud users can start ingesting their histogram data today: with the Prometheus receiver in the OpenTelemetry Collector or with OpenTelemetry libraries. The Prometheus receiver will scrape Prometheus histograms to be sent into Splunk Observability Cloud. Many existing infrastructure components like Kubernetes and Istio already make histograms available for scraping. Otherwise, users can leverage OpenTelemetry libraries to instrument their code for all major programming languages to send in histograms.
Learn more about explicit bucket histograms in Splunk Observability Cloud, and sign up for a free trial to get started today!
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.