By now, you may have heard the exciting news that Edge Processor, the easy-to-use Splunk data preparation tool for filtering, transformations and routing at the edge, is now Generally Available. Edge Processor allows data administrators for Splunk environments the ability to drop unnecessary data, mask sensitive fields, enrich payloads and conditionally route data to the appropriate destination. Managed via Splunk Cloud Platform but deployed at the customer data edge, Edge Processor helps you control data costs and prepare your data for effective downstream use.
Alongside the announcement of the GA of Edge Processor, we are also excited to announce the General Availability of the SPL2 Profile for Edge Processor! The SPL2 Profile for Edge Processor contains the specific subset of powerful SPL2 commands and functions that can be used to control and transform data behavior within Edge Processor, and represents a portion of the entire SPL2 language surface area.
In Edge Processor, there are two ways you can define your processing pipelines. The first, which is fantastic for quick and easy pipeline authoring, allows data administrators to take advantage of the point-and-click features of the Edge Processor pipeline editor. From this same pipeline editor experience, users can also opt to directly interact in the SPL2 code editor window for extremely flexible pipeline authoring. This allows data administrators to directly use Splunk’s SPL2 language to author pipelines via a code editor in a manner familiar to SPL experts. This is extremely exciting, as it allows SPL syntactical patterns to be used for transformations on data in motion! Let’s learn a bit more.
SPL2 is Splunk’s next-generation data search and preparation language designed to serve as the single entry point for a wide range of data handling scenarios and in the future will be available across multiple products. Users can leverage SPL2 to author pipelines that process data in motion, create and validate data schemas while leveraging in-line tooling and documentation. SPL2 seeks to enable a “learn once, use anywhere” language model across all Splunk features in a manner extremely familiar to SPL users today.
SPL2 takes the great parts of SPL — the syntax, the most used commands, the investigation-friendliness, and the flow-like structure — and makes it available for use not only against data at rest (e.g., via splunkd), but also for streaming runtimes. This allows data administrators, developers, and others who are familiar with SPL, but unfamiliar with configuring complex rules in props and transforms, to translate their existing SPL knowledge and apply it directly to data in-motion, via Edge Processor.
A template for an SPL2 pipeline that masks IP addresses from the hostname field of syslog data
SPL2 is already used implicitly by multiple Splunk products today under the hood, to handle data preparation, processing, search and more. Over time, we intend to make SPL2 available across the entire Splunk portfolio to support a truly unified platform.
Customers familiar with SPL will be very pleased to hear that SPL2 has introduced a range of new functionality to more seamlessly support needs for data preparation in-motion, including:
SPL2 supports a wide range of operations on data. The SPL2 profile for Edge Processor represents a subset of the SPL2 language that can be used via the Edge Processor offering. For example: at launch, Edge Processor is primarily built to help customers manage data egress, mask sensitive data, enrich fields, and prepare data for use in the right destination. SPL2 commands and eval functions that support these behaviors are supported in the profile for Edge Processor to ensure a seamless user experience. Learn more about SPL2 profiles and view a command compatibility matrix by product for SPL2 commands and eval functions.
Edge Processor pipelines are logical constructs that read in data from a source, conduct a set of operations on that data, and then write that data to a destination. All pipelines are defined entirely in SPL2 (either when directly manipulated in the code editor for Edge Processor, or indirectly created via the GUI for pipeline authoring.) SPL2 pipelines define an entire set of transformations, often related to similar types of data.
All pipelines must follow this syntax:
$pipeline = from $source | <processing logic> | into $destination;
Take the below Edge Processor pipeline, defined in SPL2:
$pipeline = from $source | rex field=_raw /user_id=(?P<user_id>[a-zA-Z0-9]+)/ | into $destination;
This SPL2 pipeline above can be decomposed into multiple components:
$pipeline_part_1 = from $source | where … | rex field=_raw /fieldA… fieldB… fieldC…
$pipeline = from $pipeline_part_1 | eval … | into $destination;
As you can probably tell, there are some differences between the SPL2 here and the SPL you know. The first is that SPL2 allows for not just single expressions, but expression assignments; entire searches can be named, treated as variables and linked together to compose a single dispatchable unit. SPL2 also supports writing into datasets, not just reading from datasets (and with a slightly different syntax). Datasets can be different things — indexes, S3 buckets, forwarders, views, and more. You’ll likely be writing to a Splunk index most of the time. You can find more details about the differences between SPL2 and SPL here.
But what if your pipeline isn’t constrained to a single sourcetype? For these scenarios, you can instead read from a specific dataset called all_data_ready (the consolidation of all Edge Processor ingress data) and apply any sourcetype logic that you’d like:
$pipeline = from all_data_ready | where sourcetype=”WMI:WinEventLog:*” | rex field=_raw /user_id=(?P<user_id>[a-zA-Z0-9]+)/ | into $destination;
You may have begun to see that SPL2 is not just a set of commands and functions, but also core concepts underneath that can enable powerful data processing scenarios. In fact, Edge Processor ships out-of-box SPL2 pipeline templates to address some canned data preparation use cases:
Beyond these templates, let’s walk through a few examples that highlight how SPL2 makes data preparation simpler.
SPL2 allows pipelines to be defined in multiple stages, for ease of organization, debugging, and logical separation. Using the statement assignments as variables later in the SPL2 module allow data admins to modularly compose their data preparation rules.
$capture_and_filter = from all_data_ready | where sourcetype=”WinEventLog:*”
$extract_fields = from $capture_and_filter | rex field = _raw /^(?P<dhcp_id>.*?),(?P<date>.*?),(?P<time>.*?),(?P<description>.*?),(?P<ip>.*?),(?P<nt_host>.*?),(?P<mac>.*?),(?P<msdhcp_user>.*?),(?P<transaction_id>.*?),(?P<qresult>.*?),(?P<probation_time>.*?),(?P<correlation_id>.*?),(?P<dhc_id>.*?),(?P<vendorclass_hex>.*?),(?P<vendorclass_ascii>.*?),(?P<userclass_hex>.*?),(?P<userclass_ascii>.*?),(?P<relay_agent_information>.*?),(?P<dns_reg_error>.*?)/
$indexed_fields = from $extract_fields | eval dest_ip = ip, raw_mac = mac, signature_id = msdhcp_id, user = msdhcp_user
$quarantine_logic = from $indexed_fields | eval quarantine_info = case(qresult==0, "NoQuarantine", qresult == 1, "Quarantine", qresult == 2, "Drop Packet", qresult == 3, "Probation", qresult == 6, "No Quarantine Information")
$pipeline = from $quarantine_logic | into $destination
As you can see above, we’ve defined four processing “stages” of this pipeline: $capture_and_filter, $extract_fields, $indexed_fields, and $quarantine_logic, with each flowing into the next, and of course with $pipeline tying it all together into the destination. When the $pipeline is run, all stages are concatenated behind the scenes, allowing the pipeline to work as expected while maintaining a degree of logical segmentation and readability.
If you’ve ever worked with JSON in Splunk, you know that it can be…tricky. It’s a never ending combination of mvindexes, mvzips, evals, mvexpands, splits, and perhaps even using SEDCMD in prop.conf.
With SPL2, it’s easier than ever, with the expand() and flatten() commands! Often used together, they can be used to first expand a field that contains an array of values to produce a separate result row for each object in the array, then flatten the key-value pairs in the object into separate fields in an event, repeating as many times as necessary.
Let’s take this JSON passed as a single event as an example, and assume it is represented by a dataset named $json_data. We want to create the timestamp at index time (that was previously missing) and extract each nested stanza into an event:
{
"key": "Email",
"value": "john.doe@bar.com"
},
{
"key": "ProjectCode",
"value": "ABCD"
},
{
"key": "Owner",
"value": "John Doe"
},
{
"key": "Email",
"value": "jane.doe@foo.com"
},
{
"key": "ProjectCode",
"value": "EFGH"
},
{
"key": "Owner",
"value": "Jane Doe"
}
}
By itself and without preparation, we’re being passed a single event with the fields stuck in the JSON body.
But, we can write the following SPL2 to easily flatten this JSON and timestamp it:
$pipeline = FROM $json_data as json_dataset | eval _time = now()
| expand json_dataset | flatten json_dataset | into $destination
Which should result in extraction of this JSON event into multiple events with fields, like so:
SPL2 within Edge Processor is extremely powerful, and this blog post only scratches the surface! If you’re interested in learning more about SPL2 or the SPL2 Profile for Edge Processor, join in! Reach out to your account team to get connected, or start a discussion in splunk-usergroups Slack .
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.