If you have been reading our hunting series, you may have noticed that many threat hunting techniques center on network-centric data sources. Thus far, we have yet to speak about the big kahuna in our hunting tool chest. We are rectifying that right here, right now: we are going to talk about Microsoft Sysmon!
In this article, we’re looking at using Sysmon to hunt for threats in endpoints.We’ll highlight some of the most valuable places to start hunting in your Windows logs. While not an exhaustive list, these tips will help your hypotheses building and provide a good starting point for hunting on your endpoints.We’re coving a TON in this article:
(Part of our Threat Hunting with Splunk series, this article was originally co-written with John Stoner. We’ve updated it recently to maximize your value.)
Splunk's security team is addicted to using Sysmon for endpoint data. (In fact, Sysmon is so much fun to use it almost makes some of us want to go back into operational security...almost.)
Sysmon is a valuable addition to your arsenal, and by gathering these events, it opens up your world to greater insight into what your Windows systems are doing!
As anyone who has Splunked a Windows machine knows, they are a bit…chatty. The good news is that not only can the universal forwarder bring in event logs, but when you combine it with Splunk Add-On for Microsoft Sysmon, you can also collect:
This flexibility provides an analyst looking to hunt with an array of options. And now let’s jump into endpoints, Sysmon and all the nitty gritty event codes.
We have written a good bit about the virtue of endpoint monitoring, in fact James Brodsky punches his ticket to .conf every year with a deep dive into endpoint that we have turned into a workshop just on that topic. Of course, we cover Sysmon right here in our Hunting with Splunk series. Also, Shannon Davis talked about chaining Sysmon processes using a tool on Splunkbase called pstree.
What does all this tell us? Endpoint monitoring is important! We like using Sysmon, particularly Event Code 1 - Process Creation, to gain fidelity into programs starting on our systems. So far, so good.
Which leads us to this next question:
Short for system monitor, Sysmon is a detection technology — it's not for prevention. Many other products perform blocking/prevention, but if we need insight into what's happening, Sysmon provides an excellent, cost-effective method. (You can download it directly from Microsoft.)
Mark Russinovich and the Sysinternals team had built many great Windows utilities and tools. Debuting in 2014, Sysmon is a continuation of that since their acquisition by Microsoft. Sysmon provides more in-depth insight into what's happening on a Microsoft system and can tell you…
Beyond these killer features, it can also report network connections from a host and many other system states that provide greater insight than if you only used Windows Event logs.
You could write a book on configuring Sysmon. We are going to touch on it here only as it relates to threat hunting. We know what you're thinking: “You want me to log everything my workstations do?” We're not suggesting that — but we are suggesting that with a little bit of tuning you can get the essential nuggets out of Sysmon.
The first thing you need is to configure a monitoring template to determine what will be collected. There are a number of these templates out there already. In fact, we like the SwiftOnSecurity Sysmon template available on GitHub to jump start our configuration. Fork away!
"That’s great," you might say, "You give us a template, but I am still concerned about generating too many events — what now?"
Well, we'll also point you toward the work that TransAlta did. TransAlta presented at .conf2017 and highlighted how they were able to filter down Sysmon to about 10MB per day per workstation and still gain actionable information. (Watch the presentation in MP4 or check out the PDF.)
With this background out of the way, let’s dive into Sysmon.
First things first: Windows creates a lot of data. Ugh. Knowing where to start and what to look for is important. What we’ll look at is by no means an exhaustive list of all the places to hunt in Windows logs. (Check out Ryan Kovar’s article on Spotting the Adversary for more interesting Event Codes).
Now the good news. Not only has the NSA published the hardest-working event codes for hunting and detection, but many other authorities, experts and researchers have, too. Depending on the adversary and their tooling, some of their actions may be difficult to see. Still, when an adversary starts living off the land and using binaries that are native to Windows, some of these event codes can pay off in ways you may not have previously imagined.
A free app from Splunkbase, Windows Event Code Security Analysis for Splunk maps these practitioner recommendations (13 of them, at last count) into a set of lookup tables within Splunk so that you can make an educated decision about what to collect and what to filter out.
Event codes recommended by lots of different authoritative sources are ones you likely should be collecting. But, if codes only are recommended by only one or two sources, maybe you can do without.
So, let’s look at some Event Codes that most everyone agrees are good places to get started and tips for hypothesizing as you conduct your hunting.
The first Windows Event Code to talk about is Event Code 4688. It may very well be the most important event code that exists. Windows defines Event Code 4688 as “A new process has been created," but it’s so much more — any process (or program) that is started by a user, or even spawned from another process, is logged with this event ID.
For instance, if a Windows PC is infected with malware or a virus, searching code 4688 will show any processes that were created by that malware. From a hunting perspective, I could hypothesize that rare processes may contain malicious activity and as such, I want to focus my hunt on them. To do that, I can search Windows data in Splunk with something like:
sourcetype="wineventlog:security"EventCode=4688 | stats count, values(Creator_Process_Name) as Creator_Process_Name by New_Process_Name | table New_Process_Name, count, Creator_Process_Name | sort count
The search above returns newly created processes as well as their Parent Process ID (if created by a parent process). Why is this information important? Child processes will always have the same Parent Process ID as the original process. This helps find malicious processes that were created and provides the information you need to clean up the infection.
If you take this search a step further, you could focus on processes that are…
By identifying rare processes on your machine, you will have insight that you might not have otherwise.
(Maximize the value you get from Event Code 4688.)
Now onto 4738; it's one of my personal favorites — “A user account was changed.” This event is logged whenever a user account is altered, which is especially important when an account is granted Administrator privileges in a domain or on a standalone Windows machine.
I love hunting for this event and looking at anything that occurs within 2 minutes on either side of it. When adversaries (hackers or your own employees) are malicious, they often attempt to “elevate” permissions on a user account.
Pro tip: Adding a search command in brackets lets us perform a Search within the Search, which narrows it down to our EventCode=4738 event(s), and the "surrounding" SPL adds the events 2 minutes before/after the event for context
Take a look at this example:
index=wineventlog [search index=wineventlog sourcetype=WinEventLog* EventCode=4738 | eval earliest=_time-120 | eval latest=_time+120 | fields host, earliest, latest] | table host, sourcetype, EventCode, Message
Event Code 4624 is created when an account successfully logs into a Windows environment. This information can be used to create a user baseline of login times and location.
This allows Splunk users to determine outliers of normal login, which may lead to malicious intrusion or a compromised account. Event Code 4624 also records the different types of logons — for instance, network or local. Using this information, you can find outliers within your network filtering by time or even logon type.
Try a search like:
sourcetype="wineventlog:security" EventCode=4624 | eventstats avg("_time") as avg stdev("_time") as stdev | eval lowerBound=(avg-stdev*exact(2)), upperBound=(avg+stdev*exact(2)) | eval isOutlier=if('_time' < lowerBound OR '_time' > upperBound, 1, 0) | table _time, isOutlier, body
It should produce a list of events and tell you whether they are statistical outliers, as shown here:
In nearly three decades of my career, I can only remember one time that I cleared the event logs on a Windows machine to troubleshoot a service. Event Code 1102 occurs when an administrator or administrative account clears the audit log on Windows. It’s not something that should be used often — but when it is, it might be to cover something up.
I’d recommend having this as “Critical” event in your SIEM (aka Splunk Enterprise Security), but it's also worth hunting for. Important to note, since you’re Splunking your important Windows servers, this “event clearing” will have no effect since all your logs are in Splunk.
Now let’s take a look at some “lesser knowns”. I will call these the B-sides, which I realize may be lost on some readers, but for others, you may recall albums or cassettes that had some seriously good stuff on the b-side, maybe in some cases better than what was on the a-side. In fact, you may even want to utilize some of these in your detections to drive more automation!
As of this writing, there are Sysmon event codes from 1-26 (not counting 255, which denotes error). It would be fairly tedious to go through every single code here and it is important to point out that configuration needs to be performed to get the most out of your Sysmon events. We aren’t going into depth on that part here, but with some good templates (namely the Swift on Security configuration) and the Splunk Add-On for Microsoft Sysmon, you’ll be good to go.
Event Code 1, Process Create, has been covered elsewhere so we won’t go through that today, suffice to say, this is the workhorse event to see what is happening on a system in terms of processes being executed and from where, so this is always a handy code, so learn it, know it, live it and be a full hot orator.
DNS Query, aka Windows event code 22, can be very handy to get a feel for the DNS queries being issued by a specific host and in conjunction with a specific image. When I use the term image, we are adapting that value into process and its associated process id, path and guid. Both the query and the result are available in the event as well.
In this example, Bud Stoll’s system is using Microsoft Edge to lookup www.blogger.com and gets an IP address back in response. In this example, this is benign traffic, but modifying that image on a suspect host, could yield greater insight on domain queries.
source="xmlwineventlog:microsoft-windows-sysmon/operational" EventCode=22 EventDescription="DNS Query" host="bstoll-l" Image="C:\\Program Files (x86)\\Microsoft\\Edge\\Application\\msedge.exe" QueryName="www.blogger.com"
File Delete archived, event code 23, can be helpful when looking for an adversary bent on destruction or covering their tracks. Event code 26, File Delete logged, is similar but event code 23 will also save the file in the ArchiveDirectory. This could result in a very large directory, so take care needs to configure this smartly and in a way that delivers value.
Either way, a combination of event code 23 and 26 could provide insight into a specific process:
Here, we are deleting a link to an Excel spreadsheet from within Microsoft Excel. This example is clearly benign, but could be used to look for more malicious activities.
source="xmlwineventlog:microsoft-windows-sysmon/operational" EventDescription="File Delete archived" Image="C:\\Program Files (x86)\\Microsoft Office\\root\\Office16\\EXCEL.EXE"
WMI Events are coded as event codes 19-21 and can be helpful to understand when filters and consumers are created.
Short for Windows Management Interface, WMI is used by all Windows systems and can be used for scripting and is being used more heavily by adversaries. In fact, MITRE ATT&CK specifically calls out Windows Management Instrumentation as an adversary technique.
Pipe creation is denoted as event code 17 and can be useful for identifying lateral movement. The pipe name is provided as well as the image information. (Are you seeing a pattern yet?)
Pipes are seen extensively in Windows environments so this event code alone does not indicate malicious activity, so further inspection is required to hunt for badness. That said, it can be an additional event to provide further context during a hunt.
If you want to determine what files have been created on a system, event code 11 is a good one to consider. This event code can get very noisy however, depending on which directories and file types it is monitoring, so some thought must be taken here. This event provides:
Here we can see that we ran a process called setup.exe and the resultant files created and their associated paths:
source="xmlwineventlog:microsoft-windows-sysmon/operational" EventCode=11 host=ghoppy-l process_name=setup.exe | table file_create_time file_path process_name host | sort _time
Lastly, let’s look at event code 13, registry value set. When activities are occurring on a system, it is common that registry settings are being added, deleted or modified. These modifications can certainly create a large number of events, but it contains a treasure trove of activities:
OK, now let’s stop with the event codes, even though there are many other Sysmon events of value.
Now let’s go hunting! We’ll walk through an actual tutorial for threat hunting in Sysmon. Let’s start by taking a look at the details found in Sysmon.
If you want to follow along at home and are in need of some sample data, then consider looking at the “BOTS V3 dataset on GitHub”. ” Note* All of the searches below were tested on the BOTSv1 data found here.
In our example below, we can see that the executable 3791.exe is being executed from the directory c:\inetpub\wwwroot\joomla. The EventDescription of Process Create is one of many kinds of events collected by Sysmon, but the process creations alone can be incredibly useful when hunting.
As we continue to look through the event, we notice a field called ParentCommandLine. This field contains the value cmd.exe /c "3791.exe 2>&1" which was a parent process of 3791.exe.
Now that we have a background in the data found in Sysmon let's apply that to a hunt.
During our hunt, we have identified file 121214.tmp on a workstation. We need to understand more about this file and what relationships it might have with other processes. Let’s start with a simple search like this:
index=botsv1 sourcetype=XmlWinEventLog:Microsoft-Windows-Sysmon/Operational CommandLine=*121214.tmp* | table CommandLine
Here we can see all of the instances where 121214.tmp showed up in the command line. These instances provide some interesting information including that this process was killed or that cmd.exe runs and then triggers 121214.tmp to run.
When we hunt, we likely want more context than only what is executed. We could craft a search that can gather what was executed and associate it with its parent process to understand what process executed and then what preceded and followed it:
index=botsv1 sourcetype=XmlWinEventLog:Microsoft-Windows-Sysmon/Operational 121214.tmp CommandLine=* | table _time CommandLine ProcessId ParentProcessId ParentCommandLine | reverse | sort _time, ParentCommandLine
In this case, we can track our file by looking at the ProcessId and ParentProcessId and their associated CommandLine and ParentCommandLine values and chain them together. We can also take this a step further and continue searching for other ProcessIds that match our ParentProcessId to see their relationships.
With our search, we see a series of processes that executed, concluding with 121214.tmp terminated by the taskkill command and a ping of the loopback address occurring. The other thing we can see is that the ParentCommandLine where the 121214.tmp was first seen is a wscript.exe that calls the file 20429.vbs from Bob Smith’s roaming profile directory. Wscript.exe is a legitimate Windows application — it is used to run VBScript files.
Fortunately, because we are logging this with Sysmon, we can easily modify our search and look for CommandLine and ParentCommandLine when either the parent process or the process ID is 3968.
index=botsv1 sourcetype=XmlWinEventLog:Microsoft-Windows-Sysmon/Operational (ProcessId=3968 OR ParentProcessId=3968) CommandLine=* | table _time CommandLine ProcessId ParentProcessId ParentCommandLine | reverse
Now we can see what appears to be VBScript executed directly from cmd.exe. We could continue iterating further and further back if we choose to, if that was required in our hunt.
The last thing we will touch on is operationalizing a hunt. Ideally, we should not be hunting for the same things over and over. (Something about doing the same thing and expecting different results?) Instead, we always recommend, if we find something that's of value, we should operationalize it and alert the incident response team.
Using our previous example, we identified what could safely be referred to as an exceptionally long string in the CommandLine, and we want to alert in the future these kinds of events. Using the eval command we discussed earlier in this series, we could build a search like this.
index=botsv1 sourcetype=XmlWinEventLog:Microsoft-Windows-Sysmon/Operational CommandLine=* | eval lenCL=len(CommandLine) | where lenCL>1000 | table _time CommandLine ProcessId ParentProcessId ParentCommandLine lenCL | sort - lenCL
In this search, we calculated the length of the CommandLine field and then filtered on lengths of CommandLine that were greater than 1000. If we wanted to use a calculation other than a fixed value, we could use the stats command to calculate a standard deviation instead. Any event that returns could be sent for investigation and action by the incident response team.
We have only scratched the surface of what Microsoft Sysmon can do, but we hope that this was enough for you to go check it out, install it and use it as part of your hunt.
Happy Hunting! 😀
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.