Now generally available, Splunk Edge Processor supports syslog-based ingestion protocols, making it well-equipped to wrangle complex and superfluous data. Users can deploy Edge Processor as an end-to-end solution for handling syslog feeds such as PAN logs, including the functionality to act as a syslog receiver, process and transform logs and route the data to supported destination(s).
Before you start writing a SPL2 pipeline to process and transform incoming logs, configure Edge Processor to natively listen for events coming over syslog by:
Once you have configured Edge Processor to receive syslog events, you will see it appear in the Edge Processor console as shown below:
The addition of syslog for Edge Processor expands the ability to filter, mask, transform and route data generated from events from network devices, Linux/Unix-like OS, and more.
Let’s delve into real-world examples of wrangling data in motion in the context of cybersecurity, specifically, log reduction of Palo Alto Networks (PAN) sources. TL;DR – no worries, check out this video demo to see this use case in action.
Have you ever been swamped by the relentless surge of log data from your Palo Alto Networks (PAN) devices? Ever felt like finding the crucial information within these logs is akin to searching for a needle in a haystack? You're not alone.
The current volume and frequency of PAN firewall log data in syslog results in delayed incident detection, longer search processing times, and slow response and automation. And given that not all log types are created equal, nor are meaningful to an organization's needs, there’s also increased log management costs to contend with. The table below characterizes the most common log types according to size and volume:
Log Type | Splunk Sourcetype |
Log Size | Log Volume |
Traffic |
pan:traffic |
Large |
Very High |
Threat |
pan:threat |
Large |
Low |
Threat : url* |
pan:threat |
Large |
Very High |
Threat : file* |
pan:threat |
Large |
High |
System |
pan:system |
Medium |
Medium |
Configuration |
pan:config |
Small |
Low |
Correlation |
pan:correlation |
Small |
Low |
HIP Match |
pan:hipmatch |
Small |
Medium |
*Note: URL and File logs are of type Threat, but are called out separately because they have a different frequency than most threat logs.
You can see from the table that Traffic logs and URL logs are the most frequent and largest, with File logs coming in second. These log types will make up the bulk of what would be ingested and indexed in Splunk.
Generally speaking, Edge Processor could support log reduction in the following ways:
More specifically, consider this example — backup software runs every night, generating thousands of connections from endpoints to a backup server. This generates a large volume of low value data that is not critical to detecting threats. Enter Edge Processor! Create a pipeline in Edge Processor for this backup app to retain only threats and drop all other events belonging to log traffic sessions, URLs, or files.
Let’s take a closer look at the challenge of gaining control of PAN logs via syslog, where our ultimate goal is to improve search performance. How? By reducing event size; removing unnecessary, “noisy” fields; and routing a full-fidelity copy of the data that is to be maintained for compliance purposes in AWS S3 — all of which, in turn, reduces ingestion and storage costs.
In particular, we aim to:
Now, let’s get started with creating pipelines in Edge Processor to transform those PAN logs and ultimately, supercharge your security operations!
The two pipelines below show how a user controls what data the pipeline applies to, how that data is to be processed, and then where the processed data is routed. The first pipeline shows how to filter and minimize data volume on the way to a Splunk index, and the second keeps a raw copy in an AWS S3 bucket for compliance reasons! However, it's essential to note this is one example of how the Edge Processor can be employed. Just like SPL, the actual query definition depends on the nuances of the data (and your creativity!), and we encourage you to tailor the Edge Processor pipelines to best fit your unique needs.
Below, you will see references to commands you may not recognize, like remove_readable_timestamp. These aren’t out-of-box SPL2 commands, but are custom functions that you can define to improve usability. Continue reading to the “Making Security Function-al” section to learn more about user-defined functions.
Pipeline definition (SPL2) |
$Source |
$destination |
|
Pipeline 1: Filter Palo Alto Firewall logs, route to Splunk Cloud |
$pipeline = | from $source // First, drop the human readable timestamp which is added by syslog, as this is redundant and not used. | remove_readable_timestamp // Then, extract the useful fields like other timestamps and event type | extract_useful_fields // Drop events of specific type and subtype which are not useful for security analysis | drop_security_noise // As field extraction generates extra fields which are not needed at index-time, use the fields command to keep only _raw | fields _raw // Lastly, route the filtered events to a specific index used for security incident analysis | eval index="security_paf_index" | into $destination; |
sourcetype= pan:firewall |
Splunk Index = security_paf_index Splunk destination = splunk_stack_security_s2s |
Pipeline 2: Route unfiltered copy of all PAN firewall logs to AWS S3 bucket |
$pipeline = | from $source | into $destination; |
sourcetype= pan:firewall |
S3 bucket: security_compliance_s3 |
As you review the first pipeline definition, you might be thinking, “wow, those SPL2 commands are super readable and straightforward!” — and you’d be right! Or at second thought, you may wonder, “hang on, there’s no way extract_useful_fields is an out-of-box SPL2 command, so how does Splunk know what’s a useful field?” — and you’d also be right!
The extract_useful_fieldscommand is made possible through custom SPL2 functions. Custom SPL2 functions are named, reusable blocks of SPL2 code that can wrap a bunch of complex SPL2, in a simple custom command or eval function; think of it like an SPL marco, but way more powerful! Let's explore this capability further.
function remove_readable_timestamp($source) { return | from $source | eval readable_time_regex = "\\w{3}\\s\\d{2}\\s\\d+:\\d+:\\d+" | eval _raw=replace(_raw, readable_time_regex, "") | fields -readable_time_regex }
function extract_useful_fields($source) { return | from $source | rex field=_raw /(\d{4}\/\d{2}\/\d{2}\s\d{2}:\d{2}:\d{2}),([\w\d]+),(?P<event_type>[A-Z]+),(?P<event_subtype>[\w\d]*),\d*,(\d{4}\/\d{2}\/\d{2}\s\d{2}:\d{2}:\d{2})/ }
function drop_security_noise($source) { return | from $source | where not(event_type IN ("CORRELATION", "HIPMATCH")) | where not(event_type in ("SYSTEM")) or (event_type IN ("SYSTEM") and not(event_subtype in ("routing", "ras"))) }
As you can see, the bodies of each of these custom SPL2 functions are composed of standard SPL2 — just like a macro. All a user needs to do is use the functions in the pipeline, but if you prefer to inline all SPL2 in your pipeline without using custom functions, you absolutely can:
$pipeline = | from $source //Remove readable & redundant timestamp | eval readable_time_regex = "\\w{3}\\s\\d{2}\\s\\d+:\\d+:\\d+" | eval _raw=replace(_raw, readable_time_regex, "") | fields -readable_time_regex //Extract useful fields | rex field=_raw /(\d{4}\/\d{2}\/\d{2}\s\d{2}:\d{2}:\d{2}),([\w\d]+),(?P<event_type>[A-Z]+),(?P<event_subtype>[\w\d]*),\d*,(\d{4}\/\d{2}\/\d{2}\s\d{2}:\d{2}:\d{2})/ //Drop security noise | where not(event_type in ("CORRELATION", "HIPMATCH")) | where not(event_type in ("SYSTEM")) or (event_type in ("SYSTEM") and not(event_subtype in ("routing", "ras"))) | fields _raw | eval index="security_paf_index" | into $destination;
Given the capabilities described above, the Edge Processor stands out with its resilient approach to modern log reduction. Its robust security foundation filters essential information and efficiently manages log data, reducing incident analysis time and accelerating threat identification. Edge Processor goes beyond these core functions, unifying security operations through effortless integration with Splunk Cloud Platform and paving the way for easier alert enrichment in future updates. The result is a tool that empowers security teams to detect and respond to threats with unmatched speed and precision, ensuring minimal disruptions to the current infrastructure.
Splunk Cloud Platform customers can access the Edge Processor for free! To activate an Edge Processor tenant in your environment, contact your Splunk sales representative or shoot an email to EdgeProcessor@splunk.com with your details.
Together, let’s make your security operations smarter, faster, and more robust!
This blog was co-authored by Xi He and Sri Tejaswi Gattupalli, Product Manager Interns, Summer 2023 and Aditya Tammana, Senior Product Manager.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.