With the rise of HEC (and with our new Splunk logging driver), we’re seeing more and more of you, our beloved Splunk customers, pushing JSON over the wire to your Splunk instances. One common question we’re hearing you ask, how can key-value pairs be extracted from fields within the JSON? For example imagine you send an event like this:
{"event":{"name":"test", "payload":"foo=bar\r\nbar=\"bar bar\"\tboo.baz=boo.baz.baz"}}
This event has two fields, name and payload. Looking at the payload field however you can see that it has additional fields that are within as key-value pairs. Splunk will automatically extract name and payload, but it will not further look at payload to extract fields that are within. That is, not unless we tell it to.
Splunk allows you to specify additional field extractions at index or search time which can extract fields from the raw payload of an event (_raw). Thanks to its powerful support for regexes, we can use some regex FU (kudos to Dritan Btincka for the help here on an ultra compact regex!) to extract KVPs from the “payload” specified above.
To specify the extractions, we will define a new sourcetype httpevent_kvp in %SPLUNK_HOME%/etc/system/local/props.conf by adding the entries below. This regex uses negated character classes to specify the key and values to match on. If you are not a regex guru, that last statement might have made you pop a blood vessel
[httpevent_kvp] KV_MODE=json EXTRACT-KVPS = (?:\\[rnt]|:")(?<_KEY_1>[^="\\]+)=(?:\\")?(?<_VAL_1>[^="\\]+)
Next configure your HEC token to use the sourcetype of httpevent_kvp, alternatively you can also set sourcetype in your JSON when you send you event.
Restart your Splunk instance, and you ready to test.
We’ll use curl to test if the new sourcetype is working.
curl -k https://localhost:8088/services/collector -H 'Authorization: Splunk 16229CD8-BB6B-449E-BA84-86F9232AC3BC' -d '{"event":{"name":"test", "payload":"foo=bar\r\nbar=\"bar bar\"\tboo.baz=boo.baz.baz"}}'
Heading to Splunk, we can see that the foo, bar and boo.baz fields were properly extracted as interesting fields.
Now heading to “All Fields” we can select each of the new fields.
And then see the values magically show up!
There’s a few things to consider when using this approach.
In short, make sure you test.
Using this approach provides a way to allow you to extract KVPs residing within the values of your JSON fields. This is useful when using our Docker Log driver, and for general cases where you are sending JSON to Splunk.
In the future, hopefully we will support extracting from field values out of the box, in the meanwhile this may work for you.
Note: Special thanks to Martin Müller who provided tweaks to the regexes to improve performance and for his suggestions in the considerations section.
----------------------------------------------------
Thanks!
Glenn Block
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.