Managing the complexities of today’s cloud native infrastructure has resulted in the increased need for observability. As cloud adoption continues to grow, the need to deliver a better customer experience, scale efficiently and increase momentum on innovation has never been more important. For many organizations to carry out these principles, two technologies are helping organizations deliver on these goals faster: Monitoring-as-Code and Infrastructure-as-Code.
This blog post will cover how Monitoring-as-Code and Infrastructure-as-Code work hand in hand and how these technologies bring efficiency across your CI/CD development pipeline. I’ll also walk you through a few simple steps for setting up Splunk’s Terraform provider to easily make monitoring and observability part of your application code.
Traditionally, deploying IT infrastructure was daunting and involved multiple teams and manual provisioning steps. Any error made during this lengthy, complex process could cause the application deployment to fail or present issues with performance, possibly affecting your customer experience. In response to the many problems with manual provisioning, Infrastructure-as-Code was born, and large-scale systems became declared in configuration files as code. An infrastructure-as-code system allows users to specify what the final infrastructure setup should look like and then trust the tool to handle the work in the backend to make the infrastructure look like the desired state. The golden example of an infrastructure-as-code platform is Terraform, from HashiCorp. When changes are required, simple modifications within your Terraform configuration are quickly reflected in the current running infrastructure.
Setting up monitoring for your infrastructure and applications can create many of the same challenges faced when manually provisioning your infrastructure. These challenges are typically faced after the initial implementation of your infrastructure and become noticeable as complexity increases in your monitoring deployment. When you use Monitoring-as-Code, your monitoring configuration is closer to your application and development workflows. In fact, it’s literally in those same workflows; checked in to your version control system and changed as part of code deployment by your CI/CD system. No matter where your infrastructure lives, whether the cloud or on-premises, the monitoring assets you need to properly observe your application are never left behind, no matter where your infrastructure lives.
It is important to remember the difference between Monitoring-as-Code and using monitoring or observability tools independently. While observability tools like Splunk Observability Cloud provide full-fidelity monitoring and troubleshooting across infrastructure, application and users in real-time, it can’t automatically determine business logic or what specific detectors are most vital to your business metrics. Monitoring-as-Code manages the entire approach to how data is collected from your application and then used to help you solve problems. Let’s take a look at an example.
HashiCorp Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. Terraform can be extended with providers, including the Splunk-Terraform provider. This provider interacts with resources supported by Splunk Observability Cloud and builds a configuration that we can include as part of our infrastructure configuration’s HCL, or as a separate terraform deployment using an API token for authentication.
In this example, we have deployed a microservices-based application using Kubernetes. The application is built using four different services. We will use Terraform to deploy a detector that will alert in a critical state if one of the four microservices is down.
To begin, we will need an API authentication token to authenticate with Splunk Observability Cloud. Navigate to the account settings and click Access Tokens.
Click on New Token, provide a name for your access token and select API Token as the permission.
With your API token created, you can now click on the Show Token link to view and use the token as part of your code when using Terraform.
With our API token in place and terraform installed, we can use Terraform CLI to deploy the following code. (Note: SignalFx is the former name for certain components of Splunk Observability Cloud.)
terraform {
required_providers {
signalfx = {
source = "splunk-terraform/signalfx"
version = "6.8.0"
}
}
}
provider "signalfx" {
# It is strongly recommended to use a secret management Terraform Provider such as Vault, but for this example we include the token here.
auth_token="API TOKEN HAS BEEN OMITTED"
api_url = "https://api.us1.signalfx.com" #use your custom Splunk Observability realm URL
}
resource "signalfx_detector" "movieappspods_notready" {
name = "One or more Movie microservice pods are not ready"
description = "This alert will trigger in the event a microservice pod for the movie applications is in a non-ready state."
program_text = <<-EOF
A = data('k8s.container.ready', filter=filter('metric_source', 'kubernetes') and filter('app', 'movies', 'actors', 'dashboard', 'directors'), rollup='count').count().publish(label='A')
detect(when(A < threshold(${var.pod_amount}))).publish('Movies Application Microservices Pods')
EOF
rule {
description = "One or more movie application microservices pods are not ready"
severity = "Critical"
detect_label = "Movies Application Microservices Pods"
notifications = ["Email,you@example.com"]
}
}
When inspecting the code, you can see that we provide the necessary fields required for the provider to authenticate with Splunk Observability Cloud (auth_token, api_url) and the required resource (signalfx_detector) to create the detector. Detectors are declared by placing SignalFlow in the program_text field. For the proper SignalFlow syntax, use the Developer Guide for Splunk Observability Cloud, or within the Splunk Observability Cloud GUI, select “Show SignalFlow” from the ellipsis menu of any existing detector or chart, as shown below, SignalFlow also supports manipulations, aggregation, or other operations run on the data in realtime, so your monitoring-as-code deployment will provide you the same robust realtime data and analytics that a GUI-setup detector would.
Next, complete the Terraform deployment, navigate to the detectors within Splunk Observability Cloud, and your detector is ready to alert you in the event of a microservice outage.
While the UI makes it simple for anyone to create Observability resources like charts and detectors without learning a new language, we recognize that advanced organizations want to be able to store their monitoring setup along with their application code. Storing all the required data in one source-of-truth (your source repo) makes it effortless to make sure that essential monitoring is always deployed with your application, and that it is always up-to-date.
Additionally, Terraform Cloud agents emit OpenTelemetry format data (metrics and traces,) so you can even analyze their performance and troubleshoot issues in Splunk Observability Cloud. You can also, of course, use Terraform to deploy the OpenTelemetry collector as well, all controlled through one centralized location.
You can sign up to start a free trial of the suite of products – from Infrastructure Monitoring and APM to Real User Monitoring and Log Observer. Learn more about Terraform and the Splunk Terraform Provider and get started with Monitoring-as-Code today!
----------------------------------------------------
Thanks!
Johnathan Campos
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.