Learn

January 03, 2024

2 Minute Read

RED Monitoring: Rate, Errors, and Duration

By Stephen Watts

The RED method is a streamlined approach for monitoring microservices and other request-driven applications, focusing on three critical metrics: Rate, Errors, and Duration. Originating from the principles established by Google's "Four Golden Signals," the RED monitoring framework offers a pragmatic and user-centric perspective on service performance.

Key Components of RED Monitoring

The RED monitoring method is tailored to enhance end-user satisfaction, focusing on these 3 metrics:

Rate

Rate racks the number and, in certain contexts, the size of requests, such as photo uploads in a photo hosting service. Monitoring rate is crucial, especially in environments susceptible to peak traffic failures, noting that both spikes and drops in requests are significant.

Errors

Counts the number of failed requests per second. Error rates provide insights into the reliability and quality of the service. Errors represent any issues leading to incomplete or incorrect results, necessitating immediate resolution.

Duration

Records the time taken for each request. This aspect is crucial for assessing the service's responsiveness and efficiency. Duration metrics, capturing the time of requests, are vital for establishing the sequence of events, particularly in complex microservices environments. This aspect is crucial for both client-side and server-side interactions. In applications involving multiple services, pinpointing issues requires understanding...

The time spent on requests
Error occurrences
Request volume per service

Duration generally falls into the realm of distributed tracing, like OpenTracing and OpenTelemetry. Distributed tracing tracks the path and time your requests take between and within services, and brings events into causal order.

Tracking RED for infrastructure

The RED method's effectiveness in its ability to track these aspects, aiding in identifying and resolving service or infrastructure-related problems. By giving us a solid, standardized starting point, RED makes it possible for separate teams to exchange clear information on concerns within the system, yet allows for expansion to cover unique needs and powers the drill down needed for cause analysis.

Learn more about RED monitoring in this presentation from .conf 2021.

Benefits & Limitations of RED Monitoring

So, what can RED do for you? Besides being an easy to remember acronym, RED tends to reduce decision fatigue in deciding how to get started observing your microservices applications. Its simplicity and clarity make the learning curve short. And it gives all of the teams, both operational and development, a common vocabulary to discuss issues and resolutions.

RED can be extended to build specifics for your unique needs based on your unique usage. And by tracking the path, duration and success of their requests, RED can serve as a proxy for user happiness.

The method enhances problem diagnosis, allowing teams to quickly identify and address performance bottlenecks or failures.
By focusing on user experience metrics, the RED Method aligns monitoring efforts with business objectives and customer satisfaction.
It simplifies automation of monitoring and alerting, enabling more effective and proactive service management.

Limitations

The RED Method is primarily suited for request-driven applications. It might not provide comprehensive insights for batch processing or streaming applications.

Wrapping Up

The RED Method represents a focused and effective strategy for monitoring microservices and other request-driven applications, ensuring that key performance indicators align with user experience and service reliability. Its simplicity and effectiveness make it a valuable tool for modern software architectures where user satisfaction is paramount.

Monitoring Guide

Google Dorking: An Introduction for Cybersecurity Professionals

In this blog post, we'll take a look at the basics of Google Dorking (AKA Google Hacking), how it can impact your organization, and steps you can take to mitigate this risk.

Learn 6 Min Read

Cross-Site Scripting (XSS) & How to Prevent Attacks

Learn about Cross Site Scripting (XSS) attacks and how they work. Check out its examples, types, impacts, and ways to prevent it.

Learn 7 Min Read

What's Moore's Law? Its Impact in 2025

Moore’s law has proved to be an accurate observation for over 50 years. Learn what Moore’s law is and why it matters today

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.

Learn more about Splunk

Subscribe to our blog

Get the latest articles from Splunk straight to your inbox.

Connect with Splunk on X

Follow @Splunk

Connect with Splunk on Instagram