Baselines are an essential part of effective cybersecurity. They provide a snapshot of normal activity within your network, which enables you to easily identify abnormal or suspicious behavior. Baseline hunting is a proactive approach to threat detection that involves setting up a baseline of normal activity, monitoring that baseline for deviations, and investigating any suspicious activity.
The PEAK Threat Hunting Framework identifies three types of hunts:
In this article, let's take an in-depth look at baseline hunts, also known as Exploratory Data Analysis (EDA) hunts.
(This article is part of our PEAK Threat Hunting Framework series. Explore the framework to unlock happy hunting!)
Baselining can help you familiarize yourself with new datasets or environments where you've never hunted before. It serves as an excellent precursor to more focused hypothesis-based or model-assisted threat hunting. Before planning and scoping future hunts, it's important to understand the available data sources, their fields, and values. After all, the "K" in PEAK stands for Knowledge!
You can run a baseline hunt at any time, and some situations naturally lend themselves to this type of hunt. For example, when you onboard a new type of security log, baselining that data source will be very helpful to you while you’re trying to figure out how best to use it for detection and response operations.
Another prime baselining opportunity would be when you start hunting in a new environment, such as when you acquire a new company or onboard a new managed security customer. Figuring out what normal activity looks like is a necessary first step in planning any type of monitoring or writing response playbooks.
As with all PEAK hunts, baseline hunts are divided into three major phases: Prepare, Execute, and Act. Let’s examine each of these phases in detail.
All hunts start with the “Prepare” phase. This is where you do all the things necessary to get ready and to ensure a successful hunt. Let’s see what this looks like for a baseline hunt.
The first step is to decide which data source you’d like to baseline. If you're starting from square one, you should make an effort to baseline all of your critical data sources. Start with the ones your hunt team relies on most, or maybe with the most security-relevant sources. If you’re not sure where to start, prioritize data sources according to their significance to your organization and its detection goals.
Once you’ve determined which data source you’re going to focus on, you’ll want to become as familiar with it as possible. If this is a common log source that many organizations deal with, such as a Windows event log or events from a common security product, a good starting point might be to find out what the vendor has to say about what’s in the data. You’ll want to:
While you’re doing your research, don’t forget to include any existing monitoring or detection measures implemented for that data, as well as the individuals or teams responsible for the systems or applications creating the data. The former can help focus future hunts, while the latter will be useful if you have questions about the logs or how to interpret them.
When conducting a hunt, it's important to narrow your focus, especially in larger environments where analyzing all the data at once may not be possible. Different systems may exhibit different behaviors, so it's helpful to group them based on similarities (such as "user desktops" or "application servers") and baseline each group individually. This approach is easier and more likely to yield better results.
Another important decision is the timeframe for data collection. Baselines are created by analyzing normal activity over a period of time, so it's essential to use enough historical data to establish what's normal. However, it's also crucial to balance having enough data with keeping the window size reasonable to avoid being overwhelmed with too much data to analyze. For most sources, between 30 and 90 days of data will probably be fine.
Using what you learned from your research, outline the tools, techniques, and resources you'll need to baseline of your data source(s).
Making a good plan helps to ensure the execution phase goes smoothly, so it’s worth spending a little time here.
With your data sources determined and a plan in place, we move into the “Execute” phase.
Following your hunt plan, it's time to collect the data and bring it all back into one place for analysis. In some cases, this may have already happened (for example, if you’re already ingesting the network logs you need into a Splunk index). In other cases, you might have to identify the specific server(s) and locations on disk from which to collect the data.
As part of the data-gathering process, you may also need to filter your dataset according to the system groups and/or timeline you established while scoping the hunt. Large networks may be generating terabytes of data every day. Sifting through this mountain of information manually, or even with automated systems, can be daunting and time-consuming. By filtering your dataset, your analysis will be more efficient and manageable.
A data dictionary is a structured repository of information about the data elements used within a data source. It provides a comprehensive description of the fields in the data source, their characteristics, relationships, and usage. Your data dictionary should contain:
When it comes to specifying the types of data for each field, here are some of the most common:
Because data sources can often contain many different fields, you don’t necessarily have to document each and every field in order to have a workable data dictionary. Often, just choosing the fields that seem to be most relevant for security is sufficient. For example, in a file transfer log, fields like account names, file names, transfer commands, and statuses might be more useful than file sizes or average transfer rates.
(Take a deeper dive into how to use data dictionaries.)
In this step, you’ll use descriptive statistics to summarize the values typically found in each of the key fields in your data dictionary. For example, you might compute:
Notice that you’re beginning to define normal behavior. These statistical descriptions are the baseline for normal activity.
Now that you have some idea about what “normal” looks like in your data, you can begin to use your baseline to identify anomalies or outliers that might indicate suspicious activity. There are many techniques for this, but here are a few of the most common:
For more on outlier detection for threat hunters, watch this talk:
After identifying outliers, you’ll want to investigate each to determine whether they represent security issues or are just benign oddities. It's advisable to seek out correlations or connections between various events or anomalies to uncover any underlying trends or potential security risks.
As with most projects involving data, especially new data you’ve never looked at before, things rarely go entirely smoothly. Gap analysis is where you identify challenges you ran into while hunting and, when possible, take action to either resolve or work around them.
Usually, these challenges will be with the data, though in some cases, you might also call out search or analysis tools that didn’t quite work out. For example, you may find that your initial data collection somehow missed data from certain systems. If you can do without those systems, you may elect to just carry on as normal, but if those systems are key to your hunt, you may need to revisit the “Gather Data'' phase in order to collect the additional data.
This step also includes validating and documenting whether all valuable fields and values are parsed and extracted correctly.
So far, we’ve looked at the data on a field-by-field basis, but it’s important to understand that any non-trivial dataset is also likely to exhibit relationships between the values in different fields. These relationships can hold critical insights, often providing much more context about the event than you can get just by examining individual data points. A classic example is the count of user logins and how they relate to the time of day, with an increase expected during the start of the typical work shift.
Building on our new foundation of knowledge, we can improve our defensive efforts as well as make future hunting efforts easier and more effective. Time to take some action!
Don't let your hard work disappear. Save your hunt data, including the tools and methods you used, so you can look back at it later or share it with other hunters. Many hunt teams use wiki pages to keep track of each hunt:
Often, hunters look back at previous hunts when they face similar situations in the future. Do yourself a favor and make sure to document your hunting process. Your future self will thank you.
Your baseline consists of the data dictionary, statistical descriptions, and field relationships. Even if you took good notes during the "Execute" phase, it's important to turn those notes into a document that others can understand.
Almost any large dataset will have suspicious-looking but benign anomalies. Don't forget to include a list of these known-benign outliers! Documenting those you already identified and investigated during the “Investigate Outliers” phase will save time during future hunts and incident investigations.
(Make the most of each investigation with these postmortem best practices.)
Since you now have some idea about what “normal” looks like in your data, and you probably also have a little experience investigating some of the outliers, you may be able to distill all of this into some automated detections. Examine each of your key fields or common relationships you identified between fields to see if there are certain values or thresholds that would indicate malicious behavior. If so, consider creating rules to automatically generate alerts for these situations.
This may not always be feasible, so don’t worry if you aren’t able to identify good alerting candidates. Baselines are all about identifying abnormal activity, but just because something is abnormal doesn’t mean it’s malicious. Simply alerting on any abnormal behavior is likely to cause a flood of low-quality alerts. The trick with alerting is to identify outliers that are most likely to signal malicious behavior.
Also, even if anomalies aren’t good candidates for automated alerting, they may still be useful as reports or dashboard items that an analyst can manually review on a regular basis or even as starting points for future hunts.
As with all types of hunts, baselines are most impactful only when you share them with relevant stakeholders to improve overall security posture. In addition to sharing with the owners of the system you baselined, you’ll want to be sure that your SOC analysts, incident responders, and detection engineers are aware that the baseline exists and that they have easy access to it. If your security team keeps a documentation wiki or other knowledge repository, that would be a great place to collect all your baselines. You might also consider linking to the baselines from the playbooks that your SOC analysts use to triage alerts.
Because so much of incident detection, response, and threat hunting relies on identifying deviations from normal behavior, good baselines are crucial for any environment or dataset. Baseline hunts let you discover not only what “normal” looks like but also what the expected benign anomalies are – information that hunters, SOC analysts, incident responders, and detection engineers need in order to do their jobs effectively.
Baseline hunts are also valuable precursors to hypothesis-based or model-assisted threat hunting. So take the time to establish good baselines in your environment and improve your ability to detect and respond to potential security threats.
As always, security at Splunk is a family business. Credit to authors and collaborators: David Bianco, Ryan Fetterman
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.