Learn

January 17, 2023

3 Minute Read

Logs vs Metrics: Pros, Cons & When to Use Which

By Michael Hedgpeth

As we at Splunk accelerate our cloud journey, we’re often faced with the decision of when to use logs vs metrics — a decision many in IT face. On the surface, one can do a lot by just observing logs and events. In fact, in the early days of Splunk Cloud, this is exactly how we observed everything. As we continue to grow, however, we find ourselves using a combination of both.

This post lays out the overall difference in logs and metrics and when to best utilize each. We hope that this analysis will help you create a better observability strategy for your own organization.

What are logs?

Almost all programs emit activities that occur within their program flow in the form of a log. These logs are generally files that can be:

Unstructured text descriptions of program execution with a timestamp
Structured (json, xml) events of the program execution

Either way, the logs are written to a file that can then be consumed by a logs search engine. The search engine collects all the logs and then presents results of various searches to the user.

Benefits & use cases for logs

Logs are emitted from almost every program. A good logs search engine is able to handle any type of log. That makes logs the easiest and quickest data source to get visibility into the state of your system.

Within a mature observability strategy, logs are essential for unplanned research and unique situations. They are great for security use cases because many of these involve the unexpected or single event situations. For example, Splunk’s security organization utilizes logs to quickly detect and remediate significant vulnerabilities, including the log4j vulnerability disclosed in late-2021.

Logs are also great for iterative software delivery because they allow developers to establish patterns for new behaviors or functionality in production, which accelerates delivering value to customers.

(Read our log management introduction.)

Challenges with logs

It might be tempting to think that logs can solve every use case. As the amount of data grows, however, a logs-only solution will become costly and relatively slow for a small set of regular searches, usually connected to alerts. This is because the process by which logs must be categorized and batched takes much more time and is much more computationally intensive than the metrics process, which we will cover next.

What are metrics?

A metric is a number, usually in the form of a counter or a gauge, that the developers decide is important to the observability of their system. Most software programs start their journey emitting logs, but only in the last decade have they also begun emitting metrics from early in their inception.

We are used to metrics in our everyday life. We see them on our speedometer with our car (a gauge) or the odometer tracking how many miles the car has driven (a counter). The makers of our car decided that it was important for the driver to have awareness of this information while driving.

(Popular discipline-based metrics include monitoring metrics and DORA metrics for DevOps.)

Benefits and challenges of metrics

For developers, often the biggest challenge to incorporating metrics is twofold:

Taking the time to determine the right metrics for their systems.
Emitting those metrics by adding program logic that will share the metrics with another system.

When done correctly, metrics are essential for planned scenarios and events. They deliver regular evaluations cheaply, quickly and reliably. This is because they are structured in a way, unlike with logs, that is predictable and therefore can be saved into a time-series database, which is tuned for this purpose. Operators are then able to quickly know where to start when investigating a degraded state in their systems.

However, the organization must remember that the source of reliability is not found in identifying all alerts and degraded states with metrics. With modern, iterative software delivery, one must be able to debug and investigate unplanned states and rapidly incorporate those observations into the product lifecycle. This is why metrics do play a critical — though not the only — role, in delivering observable, reliable IT services.

Example of metrics and logs, together

Metrics state the big picture of what is happening. If I’m driving the car, I can see the temperature of the motor and whether the coolant warning light is on (the metrics). However, if the car starts behaving outside of the norm, a mechanic might need to ask some unpredictable questions and see the actual event log of the car itself.

This is where logs and metrics differ, and we can summarize as follows:

Metrics work better for regularly knowing predictable states of the system, for example with alerts.
Logs work better for researching unpredictable states of the system, for example when you need to deeply investigate an incident.

(Read more about machine data.)

Choosing logs or metrics?

So, when we’re asked to weigh in on the logs vs metrics debate, we say both! Logs and metrics together create a complementary observability foundation, upon which we operate the Splunk Cloud Platform. Once we establish that foundation, teams will also want to connect parts of their system together with a tracing solution.

With these three elements, we have all the essential elements to view and connect the system in both predictable and unpredictable ways. Splunk products provide a world class implementation of these benefits, that we ourselves use every day. We hope that our experience will help you in your journey to solve problems easier and faster with data.

See an error or have a suggestion? Please let us know by emailing splunkblogs@cisco.com.

This posting does not necessarily represent Splunk's position, strategies or opinion.

Michael Hedgpeth

Michael works and lives in Boulder, CO where he bikes to work every day and focuses on how to make Splunk's products help engineers, support, and operations observe the state of Splunk's Cloud product. He and his wife are both in tech and try to not talk about their jobs at dinner.

They have three boys, two of whom are in high school and the third in middle school. He secretly hopes that the boys go to University of Colorado, Boulder so they can be close. When not biking around Boulder, Michael likes to play music, hike, and ski in the winter, and is also learning French.

Learn 4 Min Read

The Digital Immune System (DIS) Explained

A strategic trend per Gartner, the digital immune system is a framework for ensuring your business resilience and health. Get the full story on this concept here.

Learn 5 Min Read

An Introduction to Threat Monitoring

Discover threat monitoring, its importance in combating rising cyber risks, top tools, best practices, and how AI enhances real-time protection for your business.

Learn 5 Min Read

What Is ISO 42001 for AI?

ISO 42001 is a standard for establishing, implementing, maintaining, and continually improving your Artificial Intelligence Management System (AIMS).

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.

Learn more about Splunk

Subscribe to our blog

Get the latest articles from Splunk straight to your inbox.

Connect with Splunk on X

Follow @Splunk

Connect with Splunk on Instagram