Observability with CI/CD in a Developer World

By Splunk

Let’s turn to Wikipedia for a great definition of CI/CD:

"CI/CD bridges the gaps between development and operation activities and teams by enforcing automation in building, testing and deployment of applications. Modern day DevOps practices involve continuous development, continuous testing, continuous integration, continuous deployment and continuous monitoring of software applications throughout its development life cycle. The CI/CD practice or CI/CD pipeline forms the backbone of modern day DevOps operations."

Since we mean to create observable software, this means our software is able to export metrics, traces and logs, and correlate them via the application, machine, and cloud environment they are coming from.

This is where the Observability portfolio comes in; it is the perfect control room for a development team.

From a developer laptop, it is trivial to collect all the information exposed by the application and to send it to the Observability portfolio, allowing the creation of dashboards, integrations with other environments, debugging and watching metrics. By running the OpenTelemetry collector locally, developers can send all the information available to the aggregation and analysis tools, enabling direct availability in full fidelity.

Developers no longer need to run their own monitoring harness and can simply point at their own environment, sharing notes, recreating bugs and diagnosing issues in collaboration with their team. In practice, monitoring touches many different groups, and even within a particular discipline such as development, may also be spread through multiple discrete teams. A simplified flow may look something like this:

An alert on one service (Metrics) is handled by SRE or DevOps engineer
Leads to a timeout error (trace), handled by DevOps or software engineer
Leads to an infrastructure problem (Metric), handled by SRE or Devops
Leads to a configuration issue (Metric, Log), handled by DevOps or OPS
Leads to a memory leak in an app (Logs), handled by developer or engineer
Leads to new code and integration (CI), handled by developer or DevOps
Leads to production push (CD), handled by SRE or Devops

Obviously, this list is not comprehensive but should show the need for collaboration.

Now, when the code leaves development, is vetted and tested, and merged, the CI/CD pipeline carries on to the critical stage of integration testing. Integration tests traditionally have been pass/fail tests that check the behavior of the system. More recently, Spinnaker and others have used system data to automate canary processes or blue/green deployments, enabling yet another safeguard on our systems.

In the canary deploy, the pipeline runs a small portion of the environment with the new code. Developers write acceptance criteria as tests of this code, such as number of errors logged per minute, or whether the time it takes to execute a crucial function is decreasing or the same with the new code. In all those endeavors, developers must rely extensively on the observable data of the system. They create new constraints over time to fine-tune the pipeline’s behavior.

The Observability portfolio allows developers to quickly monitor and create alerts on metric data. Developers can easily monitor performance and see errors through the use of traces. The portfolio also exposes the logs of the application, with the ability to parse for information and create triggers. In fact, most of us aren’t going to stare at dashboards looking for trouble or watch logs live-tail on a screen. We depend on alerts to call attention to problems and assist us in locating the causes or even attempting to correct them. Thus we couple the observability portfolio to make use of advanced AI/ML analytics to help make sense of what is going on, avoiding alert storms, understanding concepts like seasonality and historic impacts, recognizing events (like a code deploy). Since we make use of AI, you can even decide to use automation techniques, like rollback of problematic deploys.

This investment pays off handsomely as developers, by collectively participating in establishing metrics, traces, and logs, creating dashboards and establishing thresholds for alerts have created a rich operations experience that directly sustains and enriches production. The development of code is now matched with the development of its monitoring and alerting, allowing it to roll out quickly and confidently to production.

So using observability in your development and deployment pipeline can make it possible to reduce your recognition and response of issues in your apps and environments. The portfolio approach, integrating and correlating all the classes of data, give you insights that are valuable and timely.

Learn more about end-to-end observability and get a free 14-day trial of Splunk Infrastructure Monitoring.

This blog post was co-authored by Antoine Toulme, Tucker Logan and Dave McAllister.

----------------------------------------------------
Thanks!
Dave McAllister

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.