“How do we receive this GitLab webhook and turn it into something useful?”
“Can we include that JSON response from our legacy HTTP based software in our observability data as logs or metrics?”
“What tool do I use to log and metricize a generic incoming HTTP request?”
The answer to all of these questions could be the OpenTelemetry collector binaries from the opentelemetry-collector-contrib repository. Want to find solutions to some common telemetry issues like how to log, transform, and metricize an incoming generic webhook? How about a specific example to obtain code push metrics and logging out of a standard GitLab webhook? It might be a bit of a journey… If so, read on.
In this blog post we will be looking at the process described below:
Why a GitLab webhook and why received pushes? Logs of when code was pushed and where are useful to security teams, especially when they are pushes directly to the main branch. Similarly, development teams and engineering management can benefit from metrics around pushes (and again, pushing directly to main branches.) Metrics and logging around code pushes can even work as some integral glue for loftier automation goals like automated roll back. It is also straightforward to send only push event webhook events from GitLab which makes the exercise easier to test. To accomplish these goals we’ll use every part of the OpenTelemetry collector including processors and connectors. Ready? Let’s look at some configs!
receivers: webhookevent: endpoint: 0.0.0.0:9444 path: "/event"
This is pretty straight forward YAML for configuring an OTEL collector. In this case we’re using the `webhookevent` receiver from the opentelemetry-collector-contrib repository (which was created by a teammate of mine Sam Halpern).
The Webhook Event Receiver is a log receiver which was designed to accept generic signals from virtually any push based source. The software works by creating a server which is managed by the opentelemetry collector at runtime. By pointing your push-based sources at the server endpoint which is exposed, requests can be received and then converted into standardized OpenTelemetry logs which can be consumed by processors and connectors or simply exported as is.
Next we’ll look at how we transform the JSON we received from the webhook into the proper logging attributes we expect of our logs.
processors: transform/logs: log_statements: - context: log statements: - merge_maps(attributes, ParseJSON(body), "upsert") - set(attributes["repository.name"], attributes["repository"]["name"]) where attributes["repository"]["name"] != nil - set(attributes["project.namespace"], attributes["project"]["namespace"]) where attributes["project"]["namespace"] != nil - set(attributes["branch"], attributes["ref"]) where attributes["ref"] != nil
Ok, that may look pretty complex… In reality we will just parse the JSON from the GitLab webhook and fix a couple of field names from the parsed JSON so we can add them as log attributes. Those attributes become important in the next step! We’ll do that by using processors to perform a couple of tasks before sending the data on to our exporters. In this case the `transform` processor is used on any logs it receives and performs a couple of functions.
Now that we have our attributes all set up the way we’d like them we can perform the magic of turning logs into metrics.
connectors: count/gitlab-push: logs: gitlab.metrics: description: gitlab webhooks conditions: - attributes["object_kind"] == "push" attributes: - key: branch default_value: None - key: project.namespace default_value: None - key: repository.name default_value: None - key: object_kind default_value: None - key: user_username default_value: None
What is a connector? To quote opentelemetry.io:
“A connector is both an exporter and receiver. As the name suggests a Connector connects two pipelines: It consumes data as an exporter at the end of one pipeline and emits data as a receiver at the start of another pipeline. It may consume and emit data of the same data type, or of different data types.”
So we can use a connector to turn one kind of telemetry like a log, into another kind of telemetry, like a metric. In this case, we use the `count` connector on our log to create a metric counting any webhook events received which have an `object_kind` of `push`. We’ll be creating attribute dimensions on this count metric for branch, project namespace, repository name, object kind (which will always be `push` for now), and the user name of the user performing the push. Using these dimensions we can easily chart how often pushes happen to a given repo (or a given project if multiple repos send in their webhooks), who is doing those pushes, and who is on the naughty list for pushing directly to `main`! But we need to send our log and metric out to our fancy Splunk tools – so let’s do that.
exporters: signalfx: access_token: "${env:SPLUNK_ACCESS_TOKEN}" api_url: "${env:SPLUNK_API_URL}" ingest_url: "${env:SPLUNK_INGEST_URL}" sync_host_metadata: false splunk_hec: token: "${env:SPLUNK_HEC_TOKEN}" endpoint: "${env:SPLUNK_HEC_URL}" # Debug logging: loglevel: info
service: telemetry: logs: level: "info" pipelines: metrics: receivers: [count/gitlab-push] exporters: [signalfx] logs: receivers: [webhookevent] processors: - transform/logs exporters: [splunk_hec, count/gitlab-push]
If your eyes didn’t glaze over from that code block and you’re still with me, congratulations – you’re almost at the finish line! Here we configure our exporters for Splunk Observability (previously known as `signalfx`) and `splunk_hec` along with environment variable references to important configurations for tokens and urls. What happens in the `service:` `pipelines:` next is important.
We configure our telemetry `pipeline` to:
And now you’ve done it! You’ve accepted a generic webhook into your OTEL receiver, parsed the json into a log, sent that log to Splunk, and then turned the log into a metric time series and sent it to Splunk Observability.
What can you do now? Let security know you’ve helped them out with more data. Make a chart of pushes contributing to branches for a major release and see work done over time by your (or other) teams. Create a naughty list of users pushing to main. Throw a pizza party!
Figure 1-1. Who's been naughty and pushed directly to main branch?
But in reality, the real treasure is the things we’ve learned along the way. Right? We’ve learned that webhook JSON data can become log data, how to massage that log data, and even how to turn it into a basic metric. These are advanced capabilities and problem solving tools that may not be immediately apparent but can be life savers. Sometimes a specific set of telemetry data is required but may come from an unexpected or non-native source. But if you can get it in as JSON? You can probably do some amazing things in the OpenTelemetry collector.
Interested in knowing more about OpenTelemetry and how it works with Splunk Products? You can easily take them all for a spin! Start a free trial of Splunk Observability and get started with OpenTelemetry today.
This blog post was authored by Jeremy Hicks, Staff Observability Field Solutions Engineer at Splunk and Sam Halpern Field Solutions Engineer at Splunk.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.