When it comes to gaining control over complex distributed systems, there are many indicators of performance that we must understand. One of the secrets to understanding complicated systems is the use of additional cardinality within our metrics, which provides further information about our distributed systems’ overall health and performance. Developers rely on the telemetry captured from these distributed workloads to determine what really went wrong and place it in context.
OpenTelemetry allows us to easily capture metrics from our applications and add custom dimensions for later analysis. In this post, I will explain how to use annotations to associate your captured measurements to provide contextual information about your distributed workloads. For example, you can add a version annotation to a metric to trivially find all requests made by one particular version anywhere in your application.
OpenTelemetry data pipelines are built with the OpenTelemetry collector. It is responsible for aggregating workload telemetry and exporting this data to an analysis system like Splunk or open-source ones like Prometheus. I’ll provide a brief introduction to annotations and configuration of the OpenTelemetry collector below.
Annotations, also known as tags, are key-value pairs of data associated with recorded measurements to provide contextual information, distinguish, and group metrics during analysis and inspection. When measurements are aggregated to become metrics, annotations are used as labels to break down the metrics. Let’s take a look at real examples of adding annotations using the Splunk distribution of the OpenTelemetry collector.
The OpenTelemetry Collector configuration file is written using YAML and a full pipeline contains the following components:
Each of these components is defined within their respective section and must also be enabled within the service (pipeline) section.
Adding a deployment environment to our workloads can be done simply - it only requires adding the resource/add_environment processor to the Splunk OpenTelemetry collector’s configuration file. The resource/add_environment processor adds the deployment.environment annotation to all spans to help you quickly identify your workloads within your analysis system, like Splunk APM.
The bold text below highlights the addition to the processors section of the configuration file to aggregate the CloudProduction annotation to contain the specific deployment environment.
processors:
resourcedetection:
detectors: [system,env,gce,ec2]
override: true
resource/add_environment:
attributes:
- action: insert
value: CloudProduction
key: deployment.environment
We then enable this processor in the pipelines section for our traces and logs of the configuration file to enable the resource/add_environment processor.
With this configuration in place, the Splunk APM console now shows the CloudProduction annotation and lets you filter throughout the backend based on which environment is handling the request. This is one of the default troubleshooting MetricSets, which Splunk APM automatically indexes.
In addition to deployment environment any other annotations can be aggregated to help with identifying application performance bottlenecks. This can be done using the attributes/newenvironment processor, which adds a span annotation to any spans that don’t already have the annotation. This is particularly useful to add metadata to your spans, like version numbers or deployment color when using blue/green deployments. Implementing the attributes/newenvironment processor is the same as resource/add_environment processor or any other processor when using OpenTelemetry. Let’s illustrate this with another example of what the attributes/newenvironment processor and the resource/add_environment processor look like as part of the same configuration.
In the configuration file below, you can see the attributes/newenvironment processor added to the previous configuration to include both the version of our microservice application and deployment color.
processors:
resourcedetection:
detectors: [system,env,gce,ec2]
override: true
resource/add_environment:
attributes:
- action: insert
value: CloudProduction
key: deployment.environment
attributes/newenvironment:
actions:
- key: version
value: "v1.0.1"
action: insert
- key: deploymentcolor
value: "green"
action: insert
When we look at the trace in Splunk APM, we see that the version and deployment color are now included as part of each span collected for our microservice application.
Adding annotations to our spans adds cardinality to our telemetry, allowing us to ultimately better understand more about our application and get answers to what went wrong and why. For example, with Splunk APM, we can create MetricSets which are categories of metrics about traces and spans you can use for real-time monitoring and troubleshooting. MetricSets are specific to Splunk but are effectively aggregates of metrics and metric time-series, enabling you to populate charts and generate alerts. Creating custom MetricSets from our previously referenced annotations identified in our examples will allow us to use specific filters to narrow down any bottleneck affecting application performance. For example, with Splunk Infrastructure Monitoring, we can narrow down all hosts particular to a given application environment, such as a region or datacenter. The screenshot below shows how we used the annotation for our deployment environment CloudProduction as a filter to create a custom dashboard showing all hosts within the CloudProduction environment.
Since all of our data is tagged with these annotations and created as MetricSets, we can also use them within Splunk APM. You can see from the example screenshots below that the annotations are now available as part of Splunk APM’s Tag Spotlight and Dynamic Service Map.
This will now allow you to filter your application telemetry specifically by these annotations and get a clear map of service dependencies and find granular trends contributing to possible application performance issues. Overall, adding custom annotations to your traces will help you narrow down your data best fit for your application's development and deployment, ultimately reducing your MTTR.
With our ability to annotate metrics best fit for your organization, you can far more quickly locate what we're looking for within our cloud-native deployments. No longer worry about limitations to identifying just where the application bottlenecks may be.
Want to try working with OpenTelemetry yourself? You can sign up to start a free trial of the suite of products – from Infrastructure Monitoring and APM to Real User Monitoring and Log Observer. Get a real-time view of your infrastructure and start solving problems with your microservices faster today.
----------------------------------------------------
Thanks!
Johnathan Campos
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.