Splunk Observability Cloud includes powerful features which automatically identify patterns within your data to surface trends. The resulting insights tell you why some customers aren’t getting an optimal experience from your application, and how you can improve it.
Unlocking these features requires attributes to be included with your application traces. But how do you know which attributes are the most valuable for your application and business?
In this article, we’ll share the best practices for deciding which attributes to collect and how they can be used in Splunk Observability Cloud to get to the root cause of an issue more quickly.
Ready to take your observability game to the next level? Let’s jump in!
Before talking about which attributes are valuable to collect, let’s take a moment to talk about what attributes are.
Attributes are key-value pairs that provide additional metadata about spans in a trace, allowing you to enrich the context of the spans you send to Splunk APM.
For example, a payment processing application would find it helpful to track:
This way, if errors or performance issues occur while processing the payment, we have the context we need for troubleshooting.
While some attributes can be added with the OpenTelemetry collector, the ones we’ll be speaking about in this article are more granular, and are added by application developers using the OpenTelemetry API.
For example, the following code can be used to add an attribute to the current span in a Java-based application:
import io.opentelemetry.api.trace.Span; ... Span mySpan = Span.current(); mySpan.setAttribute("my.attribute","value");
Please refer to Instrument your application code to add tags to spans for examples of adding attributes using other languages.
A note about terminology before we proceed. Once OpenTelemetry attributes are ingested into Splunk Observability Cloud, they become known as tags. So when you see tags mentioned throughout this article, you can treat them as synonymous with attributes.
Now that we understand what attributes are, let’s speak about which attributes you should consider collecting from your application.
To do this, it’s helpful to understand the primary use cases for attributes:
We’ll speak about both of these use cases in the sections below, and explain how they map to different types of attributes we may want to collect.
Using the Trace Analyzer capability of Splunk Observability Cloud, we can filter on traces that match a particular attribute value.
For example, we might want to filter on the order ID to find traces related to a particular order of interest:
This is particularly valuable when investigating a particular transaction or customer request.
So for this use case, we should consider attributes that would be helpful in finding traces of interest in our application. For example, we might consider attributes such as:
Filters can be used in other parts of Splunk Observability Cloud as well. For example, we can also filter spans within a particular trace:
Attributes used for filtering use cases are generally high-cardinality, meaning that there could be thousands or even hundreds of thousands of unique values. In fact, Splunk Observability Cloud can handle an effectively infinite number of unique attribute values! Filtering using these attributes allows us to rapidly locate the traces of interest.
With grouping, we can surface trends for attributes that we collect using the powerful Tag Spotlight feature in Splunk Observability Cloud.
For example, suppose our application processes orders for various customer types. We could add an attribute to classify the customers by how profitable they’ve been in the past. We might also add an attribute that tells us the version of each service utilized to process the request.
Once these attributes are collected, we can use Tag Spotlight to discover trends that contribute to errors or high latency:
For example, looking at the tenant.level tag in the screenshot above, we can see that tenants of all levels are experiencing a high error rate.
The version tag is more interesting. We can see that 100% of the requests that were processed by version 350.10 of our service had an error, while version 350.9 of our service had no errors at all.
So the attributes we collected allowed us to quickly determine that our issue is related to the new version of the service we deployed. We can initiate a rollback of this deployment until it can be fixed.
Note: Clicking on the “Logs” button at the bottom of Tag Spotlight would let us see exactly what’s going wrong with this new version of our service, but that’s a story for another article.
Another powerful use for grouping attributes is the ability to perform breakdowns in the service map.
For example, let’s break down the payment service by the version attribute, since we know from Tag Spotlight that it’s closely related to the problem at hand.
To do this, we first click on paymentservice, then we select Breakdown -> version from the drop-down menu at the right:
Once we do this, the service map is updated dynamically to show the performance of the paymentservice by version:
This confirms what we saw with Tag Spotlight earlier: an issue was introduced in the latest version of the payment service that is causing a high error rate, which should be rolled back and remediated.
Attributes used for grouping should be low to medium-cardinality, with hundreds of unique values.
Applying grouping to our trace data allows us to rapidly surface trends and identify patterns. Without the Tag Spotlight feature, we’d have to visually inspect tens or even hundreds of traces to find a pattern in the data that indicates the problem.
For your application, think about what attributes would be helpful to collect for grouping. These could include attributes such as:
Operational attributes are also helpful to collect for grouping, including attributes such as:
Note: Splunk APM indexes and generates a set of aggregated metrics known as Troubleshooting MetricSets for several tags by default. For custom attributes to be used with Tag Spotlight, they first need to be indexed. See Index span tags to generate Troubleshooting MetricSets for further details.
When capturing custom attributes in your application code, you should avoid naming conflicts with attribute names already included in OpenTelemetry’s semantic conventions.
See Best practices for creating custom attributes for further guidance on choosing names for the attributes you capture in your application.
In this article, we outlined the best practices for deciding which attributes to collect with OpenTelemetry. Some attributes can be used for filtering using the Trace Analyzer feature, while others can be used for grouping and spotting trends using the Tag Spotlight feature. Both types of attributes allow for issues to be triaged more quickly.
Collecting attributes aligned with the best practices in this article will let you get even more value from the data you’re sending to Splunk Observability Cloud. Now that you’ve finished reading this article, you have the knowledge you need to determine which attributes are most valuable for your own organization!
To get started adding attributes today, check out how to add attributes in various supported languages, and then how to use them to create Troubleshooting MetricSets so they can be analyzed in Tag Spotlight. For more help, feel free to ask a Splunk Expert.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.