If you’re like most organizations, you’re leveraging Jenkins for all sorts of things. Deployment pipelines, automated API tests, even glorified CRON jobs just to name a few.
Tracing and APM for Jenkins recently became much more straightforward with the advent of the OpenTelemetry project and an OpenTelemetry Jenkins Plugin (Maintained by Cyrille Le Clerc). Once configured, a single click can take you from your Jenkins job into a detailed waterfall chart of the entire pipeline run!
Combining the power of OpenTelemetry (OTEL), Jenkins, and Splunk APM you can leverage the granularity of distributed tracing to understand specifics of your Jenkins usage that were previously difficult to uncover while having full control of your data.
Build queue times starting to become excruciatingly long? Quickly identify builds and steps holding up Jenkins for unusual amounts of time.
Noticing a slow increase in the time it takes to run pipelines across your organization? Send your Jenkins APM data through Splunk Log Observer to emit time series metrics of all steps and easily visualize increased (or decreased) time spent on various steps across all jobs even after your Jenkins data has aged out of APM.
Are calls to external services taking longer than average? Perhaps git checkout takes longer than average or a given API’s response has become slower over time. Splunk APM’s Tag Spotlight can help visualize lengthy calls to external services in your pipeline with P50, P90, and P99 values.
Want to know when another Team’s builds are happening that may impact your service? Set up a detector on their deployments and have an event marker show up on your dashboards to quickly establish if their deployment has impacted your service’s performance.
APM (or distributed tracing for those historically inclined) is a powerful tool for understanding interactions over the entire lifespan of a given process; in this case, Jenkins deployments. Not only does it give you a nifty waterfall chart of where time was spent in each step of a Jenkins deployment, but it also provides additional data to aggregate with more common time series metrics and traditional build logs. Various parts of your organization may benefit from Jenkins trace data in unexpected ways:
With Jenkins, Splunk APM can address these concerns quickly in one place without being overwhelmed by tool sprawl. There is no need to utilize multiple tools and jump between different interfaces for Jenkins, logging, and monitoring data to understand what's really going on.
To get setup, quickly check out the Github repository for OpenTelemetry Collector configuration examples, documentation, and 2 Splunk Observability Cloud Dashboard exports to get you started. Armed with these artifacts and an OpenTelemetry Collector you’ll quickly be able to provide more detailed Jenkins insights for IT Operations, CI/CD teams, and DevOps professionals.
Figure 1-1. Get detailed Jenkins pipeline metrics with Jenkins APM data
The Github repository linked as part of this blog includes two dashboards meant to help understand specific Jenkins Pipelines and also overall Jenkins Health. They can be leveraged as-is or used as a starting point for building your own more detailed deployment dashboards.
Also included in the Github repository are instructions and SignalFlow for setting up a Detector to notify you of failed deployments. This sort of detector is useful not only for knowing when your own deployments have issues, but also for knowing when an upstream service you depend on is having a problem due to a failed (or successful) deployment. Exposing these types of events on your dashboards can help provide more context with less tool sprawl..
How do you get these insights today and how much effort does it require?
Figure 1-2. Overall Jenkins Health: Observe valuable Jenkins agent, build queuing, and even detailed step metrics (with Log Observer) at a glance.
OpenTelemetry, APM, and Infrastructure Monitoring are integral, and until now separate, but crucial tools for understanding your services. With their powers combined in one tool you will more quickly establish effects of deployments, understanding of Jenkins performance, and gain the ability to quickly notify teams of issues with their own or other services related to software builds and releases. But, the future is even brighter! These additional insights into Jenkins can help unlock metrics for better understanding the larger impacts of DevOps within your organization.
DevOps Research and Assessment (or DORA) metrics address a fundamental set of concerns when attempting to measure DevOps activity and performance. The four key metrics associated with DORA that may benefit from or require additional Jenkins context are:
Armed with your new Jenkins metrics and APM data, get out there and scrutinize pipelines, evaluate deployments, and generally push your DevOps Magic™ to the limit!
Want to quickly start understanding your Jenkins deployment? You can sign up to start a free trial of the Splunk Observability Cloud suite of products today!
This blog post was authored by Jeremy Hicks, Observability Field Solutions Engineer at Splunk with special thanks to: Doug Erkkila, Adam Schalock, Todd DeCapua, Tom Martin, Marie Duran, and Joel Schoenberg at Splunk.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.