If you’ve read my previous post on delimiter based KV extraction, you might be wandering whether you could do more with it (Anonymous Coward did). Well, yes you can, I am going to cover the “advanced” cases here. Before covering the capabilities, as in other posts, I would first go over some observations and examples.
The following header-body sample, as you can probably guess, is from an exchange server. There is a header section which among other things has the list of field names, delimited from each other using the delimiter used to delimit values in the body section, in this case a tab character is used (even though our blogging platform chooses to mangle tabs to spaces – gotta love it !!!).
# Message Tracking Log File
# Exchange System Attendant Version 6.5.7638.1
# Fields: time client-ip cs-method sc-status
14:13:11 10.1.1.9 HELO 250
14:13:13 10.1.1.9 MAIL 250
14:13:19 10.1.1.9 RCPT 250
14:13:29 10.1.1.9 DATA 250
14:13:31 10.1.1.9 QUIT 240
The following example shows how a single-delimiter can be used to list fields, it is pretty easy for us, as humans, to recognize the key value pairs:
"url http://splunk.com referer http://dev.splunk.com ip 10.10.10.10"
The delimiter based KV extraction solves the header-body problem by adding the capability to assign field names to extracted values by doing single-level tokenization/splitting (ie single delimiter) instead of the normal two-layered one described earlier. Unfortunately, however, this is only available through transforms.conf* and it requires manual specification of the field names (no automatic field name detection). To this end, we introduce another transforms.conf configuration variable, defined as follows:
FIELDS = <quoted string comma/space separated list>
List of names to associate with each extracted field value. The first entry is associated with the first
field value, the second with the second value and so on…
Example from above data:
FIELDS= "time", "client-ip", "cs-method", "sc-status"
Thus to enable header-body KV extraction one needs to specify one delimiter and a list of fields to attach to each extracted value. Let’s walk through the MS Exchange sample data: (1) we know the field delimiter is the tab character and (2) the field list, in their correct order, is in the header of the file all we have to do is quote the field names. The configuration stanza in transforms.conf should thus look like this:
....transforms.conf....
[exchange]
DELIMS = "\t"
FIELDS = "time", "client-ip", "cs-method", "sc-status"
To apply this transformation you can then run “…. | extract exchange reload=t auto=f| ….”, there’s no need to restart the server after editing the transfroms.conf as long as “reload=t” is specified in extract (btw auto=f turns off automatic KV extraction)
The results of this transformation ,on one of the events, would then be:
"14:13:11 10.1.1.9 HELO 250"
time=14:13:11
client_ip=10.1.1.9
cs_method=HELO
cs_status=250
Easy huh!? Try it in your data, we’d love to hear back ……
*The reason why this is only available through the configuration is that amount of configuration information needed.
There’s yet another trick in the delimiter KV extraction – the single-delimiter extraction. Single delimiter extraction pairs extracted field values into key=value as follows: value1=value2, value3=value4 and so on… To enable this extraction via the command line set kvdelim and pairdelim to the same value, for the above example data the extract command should look as follows:
.... | extract kvdelim=" " pairdelim=" " auto=f | ....
To enable single-delimiter extraction via transforms.conf you can either specify one delimiter or two identical delimiters in the DELIMS config variable, thus the following two transforms.conf stanzas are equivalent to each other and to the above command:
....transforms.conf....
[single-delim-1]
DELIMS = " "
[single-delim-2]
DELIMS = " ", " "
The results of these extractions for our sample data would be:
"url http://splunk.com referer http://dev.splunk.com ip 10.10.10.10"
url=http://splunk.com
referer=http://dev.splunk.com
ip=10.10.10.10
NOTE: do not specify a FIELDS variable for the single-delimiter extraction because that will enable header-body extraction.
Thoughts, ?, ideas, comments are always welcomed….
----------------------------------------------------
Thanks!
Ledion Bitincka
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.