So far in this series, we’ve shared some key techniques that are required for threat hunting using Splunk — we’ve discussed how to…
This post will continue by introducing a set of foundational Splunk threat-hunting techniques that will help you filter data. And yes, we’re going to keep on keepin’ on with the stats command, too.
(Part of our Threat Hunting with Splunk series, this article was originally written by Dave Herrald. We’ve updated it recently to maximize your value.)
Why is filtering data important? Well, Splunk allows you to store gigabytes, terabytes, or even petabytes of full-fidelity security data — yet the evidence you are seeking during a hunt or investigation is often contained in just a few events.
You need to eliminate the noise and expose the signal.
To do this, we will focus on three specific techniques for filtering data that you can start using right away. For all three tutorials, below, we use data from our Boss of the SOC v1.0 data set.
The most obvious (but often overlooked) technique for reducing the number of events returned by your Splunk search — and getting you closer to actionable results — is to specify an appropriate time range.
If you can put a left and right boundary on the timeline of your hunt, you enable Splunk to ignore events from time periods that have nothing to do with your hypothesis, potentially saving you valuable time and system resources along the way.
For most Splunk users, the easiest way to specify the time range is to use the time range picker as shown in Figure 1 (below). In this example, I’m looking at some DNS events from our Boss of the SOC v1.0 data set. Specifically, I’ve asked Splunk to search all DNS activity on August 24, 2016:
In this case, selecting an appropriate time range helped us realize a 96% reduction in both the number of events and the time to run the search!
Your data and hunting hypotheses will vary, but remember — when hunting in Splunk, it pays to pay attention to time.
Figure 1: Filtering events based on timestamp
Splunk is often referred to as a search engine for your data, and it’s easy to see why when you enter a simple phrase into the search app. Events containing this phrase begin to appear, usually within just a few seconds. We sometimes refer to searching in this way as "super-grepping", and — while it can be effective — Splunk has a lot more power under the hood.
One excellent way to up-level your Splunk search skills (and to become a more effective threat hunter in the process) is to begin harnessing the power of field-value expressions to narrow your search.
As Splunk is returning results, it’s also extracting fields from each event. You can take advantage of these fields using the Splunk Search Processing Language (SPL). Events generated by different systems in your environment will have different fields; however, all events in Splunk have a few common fields, including:
These fields are special! They are extracted and stored immediately when the events are indexed which, in turn, makes searches that use these fields very fast.
One of the best ways to begin filtering events in Splunk is to search for a specific sourcetype by simply including a field-value pair like the following in your Splunk search as early (meaning as far to the left) as possible. This example shows a simple search that filters results to include only Microsoft Sysmon events.
sourcetype="XmlWinEventLog:Microsoft-Windows-Sysmon/Operational"
Next, let’s look at how you can combine multiple fields to narrow your search even further. Note that when you search multiple fields, Splunk combines the search terms together using a logical "AND" operator.
Figure 2 below includes an example of a multi-field search that returns all the Microsoft Sysmon events that came from the system named "we4781srv".
sourcetype="XmlWinEventLog:Microsoft-Windows-Sysmon/Operational" host="we4781srv"
Note that these sample searches are subject to the time window you chose for the search (see technique 1 above), and they will only retrieve results from the Splunk indexes to which you’ve been granted access. In production environments, it’s a good practice — and sometimes required — to add "index=<myindexname>" to the beginning of your search.
Now, let’s have a look at the additional fields available in these Microsoft Sysmon events.
Figure 2: Examining fields in a Microsoft Sysmon event
A few helpful tips as you start to tailor your Splunk threat hunting searches using fields:
Get more details on the Splunk search command.
Finally, let's look at a quick and effective filtering technique we have available when threat hunting with Splunk — namely the "NOT" Boolean operator.
As we've seen, the primary goal while hunting in Splunk is to remove events from the result set that don't help to prove or disprove our hypotheses. The "NOT" operator is a great tool for this purpose.
Let's start by examining some DNS queries captured by Splunk Stream during the month of August 2016. In this case we might be investigating a system that has been behaving suspiciously, or we may be looking for "threads to pull" to help us formulate a hunting hypothesis. Here we will use Splunk to first summarize the requests, then we’ll eliminate data that we can explain — and dig deeper on data that we can't.
The search in Figure 3 (below) yields 234 unique DNS queries, with nothing obviously suspicious or malicious.
Figure 3: A simple search to review DNS activity
sourcetype=stream:dns src=192.168.250.100 query_type{}=A | stats count by query | sort -count
Now let's start filtering using "NOT." First up is to get rid of some DNS lookups that are used for browser configuration and IPv6 tunneling. To accomplish this, we add a couple of "NOT" field-value pairs. Note the use of wildcards to catch instances from different domains.
Figure 4: Starting to filter with NOT
sourcetype=stream:dns src=192.168.250.100 query_type{}=A NOT query=wpad* NOT query=isatap* | stats count by query | sort -count
Here the benefits of our filtering begin to emerge. In this case, we reduced our result set size from 234 to 176. Now let's continue by filtering results from:
Figure 5: More filtering
sourcetype=stream:dns src=192.168.250.100 query_type{}=A NOT query=wpad* NOT query=isatap* NOT query=*.windows.com NOT query=*live.com NOT query=*nsatc.net NOT query=*windowsupdate.com NOT query=*msedge.net NOT query=*trafficmanager.net NOT query=*office.com NOT query=*bing.com NOT query=*virtualearth.net NOT query=*msn.com NOT query=*.microsoft.com NOT query=*msftncsi.com NOT query=*microsoftonline.com NOT query=demo-01 NOT query=*waynecorpinc.local NOT query=*public-trust.com NOT query=*ocsp*.com NOT query=*akamaiedge.net NOT query=*akadns.net NOT query=*akamaized.net NOT query=sway-cdn.com NOT query=*symc*.com | stats count by query
We've now significantly reduced the result set size all the way down to 15! At this point, it becomes much easier to identify things like:
Figure 6: Visually inspecting the filtered data set
As searches like this grow in size, it often makes sense to consolidate them into a Splunk lookup table. Check out Using the Lookup Command for Threat Hunting (Lookup Before You Go-Go).
Each year, participants challenge themselves by playing in Splunk Boss of the SOC. Time and again, we see that competitors who are at the top of the BOTS leaderboard are those who can quickly distill large amounts of raw search results down to a few key events. Mastering the three simple filtering techniques discussed in this blog post will enable you to become a much more efficient analyst and threat hunter, and might even improve your standing at the next Splunk BOTS event!
Happy hunting!
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.