Splunk builds innovative tools which enable users, their teams, and their customers to gather millions of data points per second from an ever-growing number of sources. Together, Splunk helps users leverage that data to deliver, monitor, improve, and secure systems, networks, data, products, and customers with industry-leading solutions and expertise.
The Splunk Threat Research Team (STRT) is responsible for identifying, researching, understanding, and detecting threats — from the Critical Vulnerabilities that dropped on Twitter to those suspicious Powershell scripts that just ran on the Domain Controller — and building detections customers can run today on their Splunk Enterprise Servers. The STRT believes in the power of community contributions, the power of transparency, and the value of “showing your work.” That’s why the STRT makes all of their detections and nightly testing framework freely available to anyone at research.splunk.com and through the Enterprise Security Content Update App on Splunkbase. Today, the STRT builds on that transparency in the culmination of the Detection Testing Blog Series.
Readers following the series have watched our progress towards building a more complete tool to aid in the generation of attack datasets and the development and validation of threat detections. The team’s basic goal is simple — a flexible, scalable, automated detection testing pipeline:
In pursuing that goal, STRT built a set of tools and documented them in a series of blog posts. They’re all worth the read, but in summary:
In the EC2 workflow, testing could get stuck, take days, or the environment could be in an indeterminate state - Courtesy https://eol.jsc.nasa.gov/SearchPhotos/photo.pl?mission=ISS064&roll=E&frame=48480, by NASA, Public Domain (with edits)
Jump to Summer 2021. The STRT team had grown and so had the number of detections being written and updated. At that time, the STRT actively maintained over 600 Splunk Analytics under Splunk Security Content. In response to this growth, a few changes were made to speed up the testing and development workflow. Most notably, instead of regenerating data every time a test was run, raw data was generated once, captured, and stored for replay in the Attack Data repo. The team released and presented the initial idea for Attack Data during Splunk .conf20; this repo has become a powerful tool for STRT testing and a great resource for customers, too! It catalogs gigabytes of freely-available, organized, curated attack data that can be used for learning, testing, and writing novel detections for running on Splunk or other tools. While this change cut detection testing time from 30 minutes per detection to several minutes per detection, there was still room for improvement:
With a fresh look at the strengths and weaknesses of the current system, the STRT decided to iterate one more time!
The first “aha!” moment occurred during migration from STRT’s legacy CI/CD Solution, CircleCI, to Github Actions. GitHub Actions is powerful, flexible, and free (for public repositories). GitHub Actions can be configured to run when almost anything happens in a repo: pushes, pull requests, comments, issues, and even scheduled events. When an Action runs, it receives full control of a fresh VM called a Runner that exists for the duration of the Action. This is critical for a number of reasons:
Breaches are for whales, not your data. Start validating security detections today with Splunk Docker Containers - Courtesy https://unsplash.com/photos/JRsl_wfC-9A, by Mike Doherty (with edits)
Splunk Docker provides the ability to easily start, configure, and destroy Splunk Enterprise servers on-demand, but to tie together the docker-detection-tester.py tool was built. Specifically, this tool does the following:
Since each test runs independently and all the heavy lifting occurs inside of the containers themselves, the attack data replays and detection searches on different containers never interfere with one another! The diagram provides a logical walkthrough of how the tool runs a test.
By eliminating AWS (Batch) and moving from EC2 VMs to Docker containers for testing, true detection testing portability was achieved. The options for running testing can be customized to meet any needs. For example, with minimal setup, tests can run on:
While the ability to test in GitHub Actions was perfect for a small number of detections, it was still impossible to test a very large number of detections. Currently Splunk Security Content has over 600 detections. Even if each one takes just 60 seconds to test, the GitHub Actions maximum job execution time is only 6 hours (or about 360 detections). The STRT determined a better, faster way to scale testing using the GitHub Actions Matrix Configuration. This feature is primarily used to test builds against multiple configurations, like different application or operating system versions. For example, a developer may want to test a Python library against Python 2.7, 3.9, and 3.10 on Ubuntu 20.04, Windows Server 2022, and macOS Big Sur. This feature can start up to 256 Runners in parallel.
A simple Matrix Configuration starts 9 tests at once (3 OS versions times 3 Python versions = 9 configurations). The versions running with Python 2.7 fail on Windows, macOS, and Ubuntu
The GitHub Actions Matrix makes it possible to scale the testing framework by increasing the number of tests executing in parallel. For example, dynamically splitting 600 detections into 10 parallel detection test jobs means just 60 detections per job. This lets detection testing complete in 1/10th of the time and avoids the 6-hour maximum job execution time limit.
10 GitHub Actions Runners means 1/10th the time
To enable parallel testing for scalability, the Github Actions Workflow was broken down into three parts:
The final GitHub Actions Workflow - 619 detections in under 50 minutes! Notice the presence of the SummaryTestResults and DetectionFailureManifest files
Below is a table summarizing the results of the CI/CD testing system iterations. It includes how long each system took to start, test 1 detection, test 600 detections, and the system’s cost.
Test System |
Startup Time |
Time to Test 1 Detection |
Time to Test 600 Detections |
Cost |
Use Case |
Before AWS Batch |
N/A |
Manual |
N/A |
N/A |
Deprecated |
AWS Batch |
N/A |
5 minutes |
2 days |
$0.50 per hour (always running)* |
Legacy Solution |
Docker-Based (GitHub Actions, 1 runner) |
5 minutes |
1 minute |
600 minutes (max job time 240 minutes!) |
Free (for public repos)** |
Test new or changed detections per Commit / PR |
Docker-Based (GitHub Actions, 10 runners) |
5 minutes |
6 seconds (average) |
50 minutes |
Free (for public repos)** |
Nightly Testing of all detections in repository |
Docker-Based (Local Machine, 1 container) |
5 minutes |
1 minute |
600 minutes |
Free (plus electricity) |
Initial Detection Development and Troubleshooting |
Docker-Based, 32 containers (AWS c6i.32xlarge - 128vCPU, 256GB RAM, io2 Storage) |
5 minutes |
1.5 seconds (average) |
17 minutes |
$5.44 per hour (on-demand)* |
On-demand, rapid testing of large changes or new baselines |
* https://calculator.aws/#/
** https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions
The STRT is proud of the progress towards ensuring detections are easy to use and work as expected. Using the new testing framework, STRT has already improved a large number of detections and gained further confidence in the Splunk Security Content is delivered to customers. STRT will continue to improve its quality assurance work by:
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.