This is the third post on URL analysis, please have a look at the two other posts for more context about what can be done with Splunk to analyze URLs:
You will find in this article information on how one can detect DNS tunnels. While you can find lots of very useful apps on Splunkbase to help you analyze DNS data, it is always good for curious individuals to discover some techniques being used underneath.
A lot of captive portals are bypassed everyday by anyone able to run a DNS request, if someone can run on their machine the following command:
$ host splunk.com splunk.com has address 54.69.58.243 ...
Without being authenticated on the captive portal, then they can use any service on the internet using a DNS tunnel. There are a lot of tools out there to create those tunnels. And for a great paper on the topic, I encourage you to read the Detecting DNS Tunneling from SANS Institute.
Long time ago, the venerable Claude E. Shannon wrote the paper “A Mathematical Theory of Communication“, which I strongly encourage to read for its clarity and amazing source of information.
He invented a great algorithm known as the Shannon Entropy which is useful to discover the statistical structure of a word or message.
If you consider a word, being a discrete source of the finite number of characters type which can be considered, for each possible character there will be a set of probabilities which would produce various outputs. There will be an entropy for each character. This entropy on the chosen word is defined as the average of the output weighted on the probability of occurrence of the characters.
The previous paragraph can easily be translated into the following Python code (taken from the excellent URL Toolbox on Splunkbase:
def shannon(word): entropy = 0.0 length = len(word)
occ = {} for c in word : if not c in occ: occ[ c ] = 0 occ += 1 for (k,v) in occ.iteritems(): p = float( v ) / float(length) entropy -= p * math.log(p, 2) # Log base 2
return entropy
Which can be run directly from any word you can have in Splunk:
As you can see, the score is pretty high, which makes sense since there is a high variety of frequency over those data. If we click on the ut_shannon field to sort in reverse order, this is what you could get:
As one can see, words of low characters distribution get a low score.
If we run the following query, interesting results are shown: sourcetype="isc:bind:query" | eval list="mozilla" | `ut_parse(query, list)` | `ut_shannon(ut_subdomain)` | table ut_shannon, query | sort ut_shannon desc
As you can see in the results here, the high score come from tunnels made to the domain ip-dns.info as well as something which is unknown but could also be a tunnel: traffic towards greencompute.org
I hope this post helps you to see tools and methodologies one can use to find out unusual activity strictly based on the DNS traffic. More to come…
----------------------------------------------------
Thanks!
Sebastien Tricaud
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.