Tips for a Successful OpenTelemetry Deployment


With OpenTelemetry top of mind for many, I would like to touch on a few tips to help you rapidly and confidently carry out your OpenTelemetry deployment.
Note: While many of these tips are specific to the Splunk distribution of the OpenTelemetry collector, they still partially apply to the mainline version of the OpenTelemetry collector.
The OpenTelemetry Data Pipeline
One of OpenTelmetry’s most widely used components is the Collector, an agent that is most commonly run on each host or Kubernetes cluster. The Collector can capture system metrics, data emitted from OpenTelemetry SDKs and other components, and telemetry from other sources like Prometheus and Zipkin clients.
When deploying the OpenTelemetry Collector, planning for the best configuration is essential for a successful deployment. The OpenTelemetry Collector configuration file describes the data pipeline used to collect metrics, traces, and logs. It’s simple YAML, and defines the following:
- Receivers: How to get data in. Receivers can be push or pull-based.
- Processors: What to do with received data.
- Exporters: Where to send received data. Exporters can be push or pull-based.
- Extensions: Provide capabilities on top of the primary functionality of the collector.
Each of these components is defined within their respective section and then also must be enabled within the service (pipeline) section.
If you plan on using the Splunk distribution of OpenTelemetry, we make it easy to consider using the Splunk OpenTelemetry Configurator. Today, several splunk-distro only components are included and can't be turned off in the configurator, but are ideal for most configurations. The configurator will help you by automatically constructing a YAML file with each component required by the OpenTelemetry collector with an easy to use UI. The configurator offers configuration options for both standalone and Kubernetes deployments of the collector with a clear view of diffs from the standard configuration vs. your customized configuration. With minimal knowledge of YAML required, you can easily get started with OpenTelemetry and quickly deploy the configuration that is best suited for your organization.
Troubleshooting
Here are some common issues we’ve seen customers run into when setting up their OpenTelemetry pipelines, and how to fix them:
Metrics Are Not Showing the Correct Deployment Environment
Having your deployment environment associated with your workloads can be helpful when trying to narrow down application bottlenecks within multiple environments. There are several ways to ensure your backend service (like Splunk) displays the correct application environment.
Option 1: Include an environmental variable on your host system running the OpenTelemetry collector.
For Linux: Run the following command.
export OTEL_RESOURCE_ATTRIBUTES='deployment.environment=ProductionEnv'
For Kubernetes: Inject the bold environment variable to the container’s configuration by adding .spec.template.spec.containers.env to your deployment.yaml:
...
spec:
template:
spec:
containers:
- env:
- name: SPLUNK_OTEL_AGENT
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://$(SPLUNK_OTEL_AGENT):4317"
- name: OTEL_SERVICE_NAME
value: "<serviceName>"
- name: OTEL_RESOURCE_ATTRIBUTES
value: "deployment.environment=ProductionEnv"
image: my-image
name: myapp
...
For Windows: Modify the application’s configuration to update the environment:
$env:OTEL_RESOURCE_ATTRIBUTES='deployment.environment=ProductionEnv’
Option 2: Include the deployment environment as part of the OpenTelemetry configuration file.
Use the resource/add_environment processor to add the deployment.environment tag to all captured spans.
The bold text below highlights the addition to the processors section of the configuration file to aggregate ProductionEnv as the specific deployment environment.
processors:
resourcedetection:
detectors: [system,env,gce,ec2]
override: true
resource/add_environment:
attributes:
- action: insert
value: ProductionEnv
key: deployment.environment
How Can I View and Share My Configuration Securely for Easy Troubleshooting?
To quickly extract your running configuration from a host actively running OpenTelemetry collector, retrieve the following URL.
curl http://localhost:55554/debug/configz/effective
Note that the output redacts secure information like tokens and passwords stored within the configuration file.
exporters:
logging:
loglevel: debug
otlp:
endpoint: :4317
tls:
insecure: true
sapm:
access_token: <redacted>
endpoint: https://ingest.us1.signalfx.com/v2/trace
signalfx:
access_token: <redacted>
api_url: https://api.us1.signalfx.com
correlation: null
ingest_url: https://ingest.us1.signalfx.com
sync_host_metadata: true
splunk_hec:
endpoint: https://ingest.us1.signalfx.com/v1/log
source: otel
sourcetype: otel
token: <redacted>
How Can I Confirm the OpenTelemetry Collector is Collecting Data?
To confirm the OpenTelemetry collector is successful in collecting and exporting data you’ll want to use zPages along with the logging exporter. By default, the Splunk OpenTelemetry collector does not have zPages enabled. To do so, navigate to the location of your configuration file:
For Linux:
/etc/otel/collector/
For Windows:
\ProgramData\Splunk\OpenTelemetry Collector\agent_config.yaml
Uncomment the zpages endpoint by removing “#” from the configuration file and restart the OpenTelemetry collector service to enable the change.
zpages:
#endpoint: 0.0.0.0:55679
Note: It is recommended to always backup the active configuration file when making changes.
Now that zPages has been enabled, using a web browser navigate to the following URL to view actively captured Trace Spans:
http://localhost:55679/debug/tracez
Note: If viewing on a remote machine, replace “localhost” with the IP address of the host machine. Example: http://192.168.86.20:55679/debug/tracez
Select a latency sample associated with one of your enabled exporters to view a snapshot of the data collected by your collector.
Example zPages troubleshooting page showing collected data.
Snapshot of collected and exported data
Another great way to visualize if your collector is collecting and exporting data is to enable the logging exporter. To do so, navigate to the back to the OpenTelemetry collector’s configuration file. In this file, simply enable the logging exporter as part of your traces and logging pipeline. Here is an example below where the logging exporter was added to an existing configuration file. Note the bold text.
service:
extensions: [health_check, http_forwarder, zpages, memory_ballast]
pipelines:
traces:
receivers: [jaeger, otlp, smartagent/signalfx-forwarder, zipkin]
processors:
- memory_limiter
- batch
- resourcedetection
- resource/add_environment
- attributes/newenvironment
exporters: [sapm, signalfx, logging]
# Use instead when sending to gateway
#exporters: [otlp, signalfx]
metrics:
receivers: [hostmetrics, otlp, signalfx, smartagent/signalfx-forwarder]
processors: [memory_limiter, batch, resourcedetection]
exporters: [signalfx]
# Use instead when sending to gateway
#exporters: [otlp]
metrics/internal:
receivers: [prometheus/internal]
processors: [memory_limiter, batch, resourcedetection/internal]
exporters: [signalfx]
# Use instead when sending to gateway
#exporters: [otlp]
logs/signalfx:
receivers: [signalfx]
processors: [memory_limiter, batch]
exporters: [signalfx]
# Use instead when sending to gateway
#exporters: [otlp]
logs:
receivers: [fluentforward, otlp]
processors:
- memory_limiter
- batch
- resourcedetection
- resource/add_environment
- attributes/newenvironment
exporters: [splunk_hec, logging]
With the logging exporter now enabled, restart the OpenTelemetry collector service to enable the change.
Now that you have the logging exporter configured, use journalctl on your Linux hosts or Event Viewer on your Windows hosts to confirm the structure of your collected data. Let’s take a look at an example of exported metrics on a Linux host running the OpenTelemetry collector.
Using journalctl run the following command to begin viewing exported metrics by the logging exporter.
journalctl -u otel-collector -f
journalctl -u splunk-otel-collector.service -f (For the Splunk distribution)
The terminal now shows the exported metrics and the corresponding metadata. You now have the ability to confirm if the collector’s configuration and metadata are as expected before sending any data to your backend system.
Conclusion
OpenTelemetry has changed how organizations are making their cloud-native workloads observable. I hope that these tips can help you become more successful in your OpenTelemetry journey.
Want to try working with OpenTelemetry yourself? You can sign up to start a free trial of the suite of products – from Infrastructure Monitoring and APM to Real User Monitoring and Log Observer. Get a real-time view of your infrastructure and start solving problems with your microservices faster today. If you’re an existing customer who wants to learn more about OpenTelemetry setup, check out our documentation.
----------------------------------------------------
Thanks!
Johnathan Campos
Related Articles
About Splunk
The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.
Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.