What’s the grooviest Splunk search command goin’ round? It’s diff man, can you dig it?
That’s right, diff. What other command is based on a *nix file comparison utility that’s been around since the early 70’s?
Splunk’s diff operates just like good ol’ diff does on a *nix platform – it compares two inputs and tells you what the differences are, in a very distinct format. But where *nix diff normally compares two files, Splunk’s diff compares the content of two events.
We can use diff to compare one field in an event to that same field in another event, or we can go for broke and have diff compare “_raw” – or the content of the entire event – between two events. That’s what we’ll do below. I’m using output from a Linux server, but this could be any event content in Splunk you can imagine.
Here we are using the Splunk for *nix TA to retrieve, on the hour (the default) a list of software packages installed on our *nix systems (this is a standard scripted input provided by the package script.) The output from one of these commands looks like this against my lab server called “airseabattle” (Yes, that’s one of the original Atari 2600 cartridges brought to market in 1977. See how it’s all coming together?)
Note that there are a whole lot more lines in this event– Splunk is just displaying the first 10, but there are over 700. Since we know that the scripted input runs once an hour, the search above will only return one event due to the earliest and latest time parameters.
Now, let’s say that we want to determine all of the software packages that have changed between the time the system was in a “known good” state (the output above) and now. (There are far better ways of doing this in Splunk with package output – but let’s just keep on truckin’. This is about diff…)
The diff command has a few arguments. If you run diff by itself with no arguments, it compares the first event returned by your search to the second event, and it uses _raw as the extracted field to compare. Which means, it compares everything in the first event to everything in the second, using the diff algorithm. Options for diff include the ability to tell Splunk which field to use for comparison (attribute=), which event in the results to compare against which (position1= and position2=), whether to display a header explaining the output (diffheader=) and how much content in bytes will be evaluated by diff (by default 100KB, you change this by setting maxlen=).
So let’s tell Splunk to retrieve my “baseline” (old) list of installed packages from 3 days ago for “airseabattle”, and append the most recent (new) output of the same, and use diff to compare them:
index=os sourcetype=package host=airseabattle earliest=-3d@d latest=-3d@d+1h | append [search index=os sourcetype=package host=airseabattle earliest=-60m@m latest=now ] | diff maxlen=0
When we look at the output we can see that the package wget has been updated on this server. The context is from the old event, compared to the new. Content that has been added to the event is shown here with a “+” sign – we can see that wget has been updated to version 1.11.el6_5. Content that has been deleted from the event is shown here with a “-“ sign – we can see that wget no longer is at version 1.8.el6.
The changes are preceded with these headers, which conform to unified diff format:
@@ -66,6 + 66,7 @@ @@ -368,7 +369,6 @@
These hunk ranges simply means that the changes found in the event output start at lines 66 and 368. The other numbers correspond to the lines of context displayed.
A few other things to know about diff. The command calls a Python-authored search command under the covers, and leverages the standard difflib library for Python. Previous versions of diff did not support the maxlen argument. But since you’re calling Python underneath when you use diff it is best to keep the number of multi-line events that you feed it to a minimum – possible resource consumption issues may arise.
That’s about it for diff. Have any interesting use cases for diff that you’d like to share? Or maybe, in the spirit of the 70’s, you want to share your favorite Starsky and Hutch episode or Jimmy Carter State of the Union speech? Sound off below.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.