OpenTelemetry offers vendor-agnostic APIs, software development kits (SDKs), agents, and other tools for collecting telemetry data from cloud-native applications and their supporting infrastructure to understand their performance and health. As the open standard to collect telemetry for cloud-native applications to be analyzed by backend platforms like Splunk, OpenTelemetry is about owning and controlling your data which makes it no surprise that OpenTelemetry has become widely adopted by many organizations as part of their observability framework for cloud-native software. Additionally, several popular open-source apps and middlewares are shipping with OpenTelemetry instrumentation built-in.
With OpenTelemetry top of mind for many, I would like to touch on a few tips to help you rapidly and confidently carry out your OpenTelemetry deployment.
Note: While many of these tips are specific to the Splunk distribution of the OpenTelemetry collector, they still partially apply to the mainline version of the OpenTelemetry collector.
One of OpenTelmetry’s most widely used components is the Collector, an agent that is most commonly run on each host or Kubernetes cluster. The Collector can capture system metrics, data emitted from OpenTelemetry SDKs and other components, and telemetry from other sources like Prometheus and Zipkin clients.
When deploying the OpenTelemetry Collector, planning for the best configuration is essential for a successful deployment. The OpenTelemetry Collector configuration file describes the data pipeline used to collect metrics, traces, and logs. It’s simple YAML, and defines the following:
Each of these components is defined within their respective section and then also must be enabled within the service (pipeline) section.
If you plan on using the Splunk distribution of OpenTelemetry, we make it easy to consider using the Splunk OpenTelemetry Configurator. Today, several splunk-distro only components are included and can't be turned off in the configurator, but are ideal for most configurations. The configurator will help you by automatically constructing a YAML file with each component required by the OpenTelemetry collector with an easy to use UI. The configurator offers configuration options for both standalone and Kubernetes deployments of the collector with a clear view of diffs from the standard configuration vs. your customized configuration. With minimal knowledge of YAML required, you can easily get started with OpenTelemetry and quickly deploy the configuration that is best suited for your organization.
Here are some common issues we’ve seen customers run into when setting up their OpenTelemetry pipelines, and how to fix them:
Having your deployment environment associated with your workloads can be helpful when trying to narrow down application bottlenecks within multiple environments. There are several ways to ensure your backend service (like Splunk) displays the correct application environment.
Option 1: Include an environmental variable on your host system running the OpenTelemetry collector.
For Linux: Run the following command.
export OTEL_RESOURCE_ATTRIBUTES='deployment.environment=ProductionEnv'
For Kubernetes: Inject the bold environment variable to the container’s configuration by adding .spec.template.spec.containers.env to your deployment.yaml:
...
spec:
template:
spec:
containers:
- env:
- name: SPLUNK_OTEL_AGENT
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://$(SPLUNK_OTEL_AGENT):4317"
- name: OTEL_SERVICE_NAME
value: "<serviceName>"
- name: OTEL_RESOURCE_ATTRIBUTES
value: "deployment.environment=ProductionEnv"
image: my-image
name: myapp
...
For Windows: Modify the application’s configuration to update the environment:
$env:OTEL_RESOURCE_ATTRIBUTES='deployment.environment=ProductionEnv’
Option 2: Include the deployment environment as part of the OpenTelemetry configuration file.
Use the resource/add_environment processor to add the deployment.environment tag to all captured spans.
The bold text below highlights the addition to the processors section of the configuration file to aggregate ProductionEnv as the specific deployment environment.
processors:
resourcedetection:
detectors: [system,env,gce,ec2]
override: true
resource/add_environment:
attributes:
- action: insert
value: ProductionEnv
key: deployment.environment
To quickly extract your running configuration from a host actively running OpenTelemetry collector, retrieve the following URL.
curl http://localhost:55554/debug/configz/effective
Note that the output redacts secure information like tokens and passwords stored within the configuration file.
exporters:
logging:
loglevel: debug
otlp:
endpoint: :4317
tls:
insecure: true
sapm:
access_token: <redacted>
endpoint: https://ingest.us1.signalfx.com/v2/trace
signalfx:
access_token: <redacted>
api_url: https://api.us1.signalfx.com
correlation: null
ingest_url: https://ingest.us1.signalfx.com
sync_host_metadata: true
splunk_hec:
endpoint: https://ingest.us1.signalfx.com/v1/log
source: otel
sourcetype: otel
token: <redacted>
To confirm the OpenTelemetry collector is successful in collecting and exporting data you’ll want to use zPages along with the logging exporter. By default, the Splunk OpenTelemetry collector does not have zPages enabled. To do so, navigate to the location of your configuration file:
For Linux:
/etc/otel/collector/
For Windows:
\ProgramData\Splunk\OpenTelemetry Collector\agent_config.yaml
Uncomment the zpages endpoint by removing “#” from the configuration file and restart the OpenTelemetry collector service to enable the change.
zpages:
#endpoint: 0.0.0.0:55679
Note: It is recommended to always backup the active configuration file when making changes.
Now that zPages has been enabled, using a web browser navigate to the following URL to view actively captured Trace Spans:
http://localhost:55679/debug/tracez
Note: If viewing on a remote machine, replace “localhost” with the IP address of the host machine. Example: http://192.168.86.20:55679/debug/tracez
Select a latency sample associated with one of your enabled exporters to view a snapshot of the data collected by your collector.
Example zPages troubleshooting page showing collected data.
Snapshot of collected and exported data
Another great way to visualize if your collector is collecting and exporting data is to enable the logging exporter. To do so, navigate to the back to the OpenTelemetry collector’s configuration file. In this file, simply enable the logging exporter as part of your traces and logging pipeline. Here is an example below where the logging exporter was added to an existing configuration file. Note the bold text.
service:
extensions: [health_check, http_forwarder, zpages, memory_ballast]
pipelines:
traces:
receivers: [jaeger, otlp, smartagent/signalfx-forwarder, zipkin]
processors:
- memory_limiter
- batch
- resourcedetection
- resource/add_environment
- attributes/newenvironment
exporters: [sapm, signalfx, logging]
# Use instead when sending to gateway
#exporters: [otlp, signalfx]
metrics:
receivers: [hostmetrics, otlp, signalfx, smartagent/signalfx-forwarder]
processors: [memory_limiter, batch, resourcedetection]
exporters: [signalfx]
# Use instead when sending to gateway
#exporters: [otlp]
metrics/internal:
receivers: [prometheus/internal]
processors: [memory_limiter, batch, resourcedetection/internal]
exporters: [signalfx]
# Use instead when sending to gateway
#exporters: [otlp]
logs/signalfx:
receivers: [signalfx]
processors: [memory_limiter, batch]
exporters: [signalfx]
# Use instead when sending to gateway
#exporters: [otlp]
logs:
receivers: [fluentforward, otlp]
processors:
- memory_limiter
- batch
- resourcedetection
- resource/add_environment
- attributes/newenvironment
exporters: [splunk_hec, logging]
With the logging exporter now enabled, restart the OpenTelemetry collector service to enable the change.
Now that you have the logging exporter configured, use journalctl on your Linux hosts or Event Viewer on your Windows hosts to confirm the structure of your collected data. Let’s take a look at an example of exported metrics on a Linux host running the OpenTelemetry collector.
Using journalctl run the following command to begin viewing exported metrics by the logging exporter.
journalctl -u otel-collector -f
journalctl -u splunk-otel-collector.service -f (For the Splunk distribution)
The terminal now shows the exported metrics and the corresponding metadata. You now have the ability to confirm if the collector’s configuration and metadata are as expected before sending any data to your backend system.
OpenTelemetry has changed how organizations are making their cloud-native workloads observable. I hope that these tips can help you become more successful in your OpenTelemetry journey.
Want to try working with OpenTelemetry yourself? You can sign up to start a free trial of the suite of products – from Infrastructure Monitoring and APM to Real User Monitoring and Log Observer. Get a real-time view of your infrastructure and start solving problems with your microservices faster today. If you’re an existing customer who wants to learn more about OpenTelemetry setup, check out our documentation.
----------------------------------------------------
Thanks!
Johnathan Campos
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.