When managing distributed environments, we find ourselves challenged with looking for different ways to understand performance better. Telemetry data is critical for solving such a challenge and helping DevOps and IT groups understand these systems’ behavior and performance. To get the most from telemetry data, it has to be captured and analyzed, then tagged to add relevant context, all while being sure to maintain the security and efficiency of user and business data. The OpenTelemetry collector and its processing capabilities can help to manipulate data before it’s sent to an observability system.
At its core, Splunk Observability Cloud uses the OpenTelemetry observability framework. Most other observability vendors can also consume OpenTelemetry data. OpenTelemetry offers vendor-agnostic APIs and software development kits (SDKs) for collecting telemetry data from cloud-native applications and their supporting infrastructure, and the OpenTelemetry Collector can aggregate and send this data to a commercial analysis system like Splunk or open-source ones like Prometheus. The collector uses pipelines to receive, process, and export metric and trace data with components known as receivers, processors, and exporters. Let’s dive further to demonstrate what processors can do with your application’s telemetry to achieve better security and efficiency.
In the OpenTelemetry workflow for a trace event, the trace is generated by the application, received by a receiver, then manipulated by a processor before being exported by exporters. The Splunk Distribution of the OpenTelemetry Collector offers support for various processors for different use cases. For example, the attributes processor is responsible for modifying attributes of a span within a log or trace by using supported actions identified within the collector’s configuration. Actions are taken upon a specific tag within the spans sent to the backend service.
Suppose we find ourselves using auto-instrumentation for our workloads that may contain telemetry that must be secured. For example, if your customer ID tag is the customer’s email address, you may not want that address stored in your observability system. In that case, the hash action is a perfect choice, using SHA1 to hash the contents of an attribute exported to Splunk Observability Cloud. If you use this action, the sensitive data is converted into a string that you can store without worry, as the hashing operation is not reversible. The delete action is another great option for completely removing the attribute from our telemetry exported to the service.
Batch processors is another great example. The batch processor accepts spans, metrics, or logs and places them into batches to better compress the data and reduce the number of outgoing connections required to transmit the data. It is highly recommended to configure the batch processor on every collector to improve the overall efficiency of your data sent to an observability system.
To implement Processors, we must first understand the basics of the OpenTelemetry configuration file. The OpenTelemetry Collector configuration file is written using YAML and composed of the following below to create a pipeline definition.
Each of these components is defined within their respective section and enabled within the service section. The example below shows each of the component configurations highlighted by a different color. Receivers in yellow, processors in green, and exporters in blue. No extensions in this example were defined. Then under the component definition section is the service section; where for traces: otlp is used as the receiver, batch for processors, and sapm for exporters. This is the recommended configuration for Splunk Observability Cloud, but by changing the exporter, data can be sent to other platforms as well.
Let’s illustrate this with a real example. In the configuration below, you can see the attributes processor is now defined to hash any key named “ssn”. Note the configuration below in bold. We use the action hash and identify the key “ssn” in the appropriate YAML format.
#Define each component.
Receivers:
otlp:
protocols:
grpc:
endpoint: localhost:4317
http:
endpoint: localhost:55681
processors:
batch:
#Definition for attributes processor. Delete any key that may contain "ssn".
attributes:
actions:
- action: hash
key: "ssn"
exporters:
sapm:
access_token: YOUR_TOKEN
endpoint: https://ingest.us0.splunk.com/v2/trace
#Enable components.
service:
pipelines:
traces:
receivers: [otlp]
#attributes processor enabled.
processors: [attributes]
exporters: [sapm]
Without the processor in place, we see the span shows the user’s SSN in plain text.
With the processor in place, the span now shows the user’s SSN in a consistent, but secure and unreadable format. Any trace with the SSN of 123-45-6789 will use the same hashed value in your observability system. (Note: for various reasons, we strongly recommend you do not rely on this for truly sensitive data like SSNs. The best practice with actual data of that sensitivity level is to delete it using the processors delete action, and to instead emit a different tag in your application.)
The Splunk OpenTelemetry Collector is configured using agent_config.yaml. Located in /etc/collector/otel for Linux (Debian/RPM) and \ProgramData\Splunk\OpenTelemetry Collector\ for Windows. By default, you will find it contains the recommended starting configuration for most environments.
If you’re using a different distribution, the configuration file may live elsewhere but should be able to be manipulated in the same way, using whatever processors are provided by your vendor.
Processors can be very useful to modify attributes of a span, compress your telemetry data, and help with including or excluding metrics within your telemetry. It is important to consider that by default, several processors are enabled with the Splunk OpenTelemetry Collector to best work with the service. Although optional, depending on the data source and your requirements, it may be recommended that additional processors be enabled.
Want to try working with Splunk Observability Cloud yourself? You can sign up to start a free trial of the suite of products – from Infrastructure Monitoring and APM to Real User Monitoring and Log Observer. Get a real-time view of your infrastructure and start solving problems with your microservices faster today.
If you’re an existing customer who wants to learn more about OpenTelemetry setup, check out our documentation.
----------------------------------------------------
Thanks!
Johnathan Campos
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.