Observability

January 29, 2024

5 Minute Read

Up Your Observability Game With Attributes

By Derek Mitchell

Splunk Observability Cloud includes powerful features which automatically identify patterns within your data to surface trends. The resulting insights tell you why some customers aren’t getting an optimal experience from your application, and how you can improve it.

Unlocking these features requires attributes to be included with your application traces. But how do you know which attributes are the most valuable for your application and business?

In this article, we’ll share the best practices for deciding which attributes to collect and how they can be used in Splunk Observability Cloud to get to the root cause of an issue more quickly.

Ready to take your observability game to the next level? Let’s jump in!

What Are Attributes?

Before talking about which attributes are valuable to collect, let’s take a moment to talk about what attributes are.

Attributes are key-value pairs that provide additional metadata about spans in a trace, allowing you to enrich the context of the spans you send to Splunk APM.

For example, a payment processing application would find it helpful to track:

The payment type used (i.e. credit card, gift card, etc.)
The ID of the customer that requested the payment

This way, if errors or performance issues occur while processing the payment, we have the context we need for troubleshooting.

While some attributes can be added with the OpenTelemetry collector, the ones we’ll be speaking about in this article are more granular, and are added by application developers using the OpenTelemetry API.

For example, the following code can be used to add an attribute to the current span in a Java-based application:

import io.opentelemetry.api.trace.Span;
... 
Span mySpan = Span.current();
mySpan.setAttribute("my.attribute","value");

Please refer to Instrument your application code to add tags to spans for examples of adding attributes using other languages.

Attributes vs. Tags

A note about terminology before we proceed. Once OpenTelemetry attributes are ingested into Splunk Observability Cloud, they become known as tags. So when you see tags mentioned throughout this article, you can treat them as synonymous with attributes.

Which Attributes Should I Collect?

Now that we understand what attributes are, let’s speak about which attributes you should consider collecting from your application.

To do this, it’s helpful to understand the primary use cases for attributes:

Filtering
Grouping

We’ll speak about both of these use cases in the sections below, and explain how they map to different types of attributes we may want to collect.

Filtering

Using the Trace Analyzer capability of Splunk Observability Cloud, we can filter on traces that match a particular attribute value.

For example, we might want to filter on the order ID to find traces related to a particular order of interest:

This is particularly valuable when investigating a particular transaction or customer request.

So for this use case, we should consider attributes that would be helpful in finding traces of interest in our application. For example, we might consider attributes such as:

loan.id for loan-processing application
customer.id for an application that processes requests for specific customers

Filters can be used in other parts of Splunk Observability Cloud as well. For example, we can also filter spans within a particular trace:

Attributes used for filtering use cases are generally high-cardinality, meaning that there could be thousands or even hundreds of thousands of unique values. In fact, Splunk Observability Cloud can handle an effectively infinite number of unique attribute values! Filtering using these attributes allows us to rapidly locate the traces of interest.

Grouping

With grouping, we can surface trends for attributes that we collect using the powerful Tag Spotlight feature in Splunk Observability Cloud.

For example, suppose our application processes orders for various customer types. We could add an attribute to classify the customers by how profitable they’ve been in the past. We might also add an attribute that tells us the version of each service utilized to process the request.

Once these attributes are collected, we can use Tag Spotlight to discover trends that contribute to errors or high latency:

For example, looking at the tenant.level tag in the screenshot above, we can see that tenants of all levels are experiencing a high error rate.

The version tag is more interesting. We can see that 100% of the requests that were processed by version 350.10 of our service had an error, while version 350.9 of our service had no errors at all.

So the attributes we collected allowed us to quickly determine that our issue is related to the new version of the service we deployed. We can initiate a rollback of this deployment until it can be fixed.

Note: Clicking on the “Logs” button at the bottom of Tag Spotlight would let us see exactly what’s going wrong with this new version of our service, but that’s a story for another article.

Another powerful use for grouping attributes is the ability to perform breakdowns in the service map.

For example, let’s break down the payment service by the version attribute, since we know from Tag Spotlight that it’s closely related to the problem at hand.

To do this, we first click on paymentservice, then we select Breakdown -> version from the drop-down menu at the right:

Once we do this, the service map is updated dynamically to show the performance of the paymentservice by version:

This confirms what we saw with Tag Spotlight earlier: an issue was introduced in the latest version of the payment service that is causing a high error rate, which should be rolled back and remediated.

Attributes used for grouping should be low to medium-cardinality, with hundreds of unique values.

Applying grouping to our trace data allows us to rapidly surface trends and identify patterns. Without the Tag Spotlight feature, we’d have to visually inspect tens or even hundreds of traces to find a pattern in the data that indicates the problem.

For your application, think about what attributes would be helpful to collect for grouping. These could include attributes such as:

customer.type for an application that handles requests for several different types of customers
payment.type for an application that utilizes several different payment processors such as gift cards, credit cards, etc.

Operational attributes are also helpful to collect for grouping, including attributes such as:

service.version so we can group performance by version.

Note: Splunk APM indexes and generates a set of aggregated metrics known as Troubleshooting MetricSets for several tags by default. For custom attributes to be used with Tag Spotlight, they first need to be indexed. See Index span tags to generate Troubleshooting MetricSets for further details.

How Should I Name My Attributes?

When capturing custom attributes in your application code, you should avoid naming conflicts with attribute names already included in OpenTelemetry’s semantic conventions.

See Best practices for creating custom attributes for further guidance on choosing names for the attributes you capture in your application.

Conclusion

In this article, we outlined the best practices for deciding which attributes to collect with OpenTelemetry. Some attributes can be used for filtering using the Trace Analyzer feature, while others can be used for grouping and spotting trends using the Tag Spotlight feature. Both types of attributes allow for issues to be triaged more quickly.

Collecting attributes aligned with the best practices in this article will let you get even more value from the data you’re sending to Splunk Observability Cloud. Now that you’ve finished reading this article, you have the knowledge you need to determine which attributes are most valuable for your own organization!

To get started adding attributes today, check out how to add attributes in various supported languages, and then how to use them to create Troubleshooting MetricSets so they can be analyzed in Tag Spotlight. For more help, feel free to ask a Splunk Expert.

Derek Mitchell

With over 20 years of experience in the software industry, Derek has worked with organizations of all sizes to help them adopt Observability solutions and create optimized customer experiences through performant and resilient applications. He uses his hands-on experience with multiple software languages, along with expertise in OpenTelemetry, to help teams instrument their applications and resolve performance issues. He enjoys sharing his knowledge with others on multiple platforms, including blog articles on Splunk.com and Splunk Lantern.

Observability 2 Min Read

4 Steps to 'Jump Start' Your Observability Journey with Splunk

Get up and running with insights from your data in minutes with these four steps to jump-start your Splunk Observability journey.

Observability 7 Min Read

Up Close Monitoring with AWS Fargate

AWS Fargate makes using containers easier, but it also means more to monitor and track, to make sure we get the results we are targeting – read on to discover how Splunk can help.

Observability 6 Min Read

Understanding Where You Fit in the Web Performance Maturity Curve

Splunker Mat Ball explores the Performance Maturity Curve, and how organizations can use this to systematically improve their performance and business in a sustainable way.

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.

Learn more about Splunk

Subscribe to our blog

Get the latest articles from Splunk straight to your inbox.

Connect with Splunk on X

Follow @Splunk

Connect with Splunk on Instagram