Previous installments of this series have given you the overview and configuration details you need to ingest any source that is supported by Splunk Connect for Syslog and configure customizations and overrides that match your enterprise. This leaves one key capability of SC4S that we have not yet covered, and that is extending the platform itself.
In this installment, we'll walk through the configuration of an entirely new data source – one that SC4S does address out of the box. Let’s dive into to the task of adding support for a new data source to SC4S!
Prior to launching an effort to create a log path (filter) for a new device, you’ll want to gather answers to some key questions:
The answers to the above questions will guide you as you extend the platform with a new log path. Let’s start with some background on what the overall structure of a syslog-ng configuration looks like, as an understanding of this will be necessary for crafting a log path.
Here is the overall structure of a syslog-ng configuration file. The syslog-ng configuration file syntax, itself a programming language, offers a myriad of ways to “skin the cat." But all follow this same basic scheme:
In SC4S, most of this is abstracted via the configuration mechanisms described in Part 3, which frees the administrator from understanding the nuances of the syslog-ng syntax. But the structure of a log path will become immediately apparent when developing a new one, and an understanding of how the parts fit together is crucial.
In a typical syslog-ng configuration — including SC4S — there will be several log paths, one for each “flavor” of event (device). These event formats are typically set by the vendors themselves and should comply with published (RFC 3164 or RFC 5424) syslog standards, but many have deviations from these standards which must be taken into account in the log paths.
Events flow from top to bottom in the final config file, with each one getting tested by the filters in each log path to see if that event “belongs” there. Though the diagram above shows filters as a separate entity, in reality all stages (except for the final destination directive) act as a filter (or test). This means the source block itself acts as a filter; if the source block says, “Collect on UDP port 5000” and the event shows up on UDP port 514, that log path will not be used for that event. Similarly, message parsing can also act as a filter (if desired) and exclude events from that log path if the parsing fails. If the event “survives” this long in a log path, it is ultimately sent to one or more destinations of the administrator’s choice.
There is nothing preventing an event from matching more than one log path, but we discourage this in SC4S and indeed configure each log path with a flag to terminate processing if the event is successfully processed by a given log path. In this case the “first one wins” and is the only place in syslog-ng where the names of the log path filenames matter: log paths are processed in lexicographical order (essentially alphabetically) by filename. Internally, SC4S uses appropriate filenames to force certain log paths to be ahead (or behind) all others, forcing a “winner” should more than one log path “fire” based on the filtering alone. This technique is used only for fallback (catchall) and “null queue” log paths, so it should not affect any log paths developed in the field.
We will now explore a key feature of SC4S, which is fundamental for log path creation. Remember early in Part 3 where we discussed the environment variables in env_file? If we specify a HEC endpoint URL and token with an environment variable there, how does that get translated into the syslog-ng syntax, which effectively must be “hard coded”? The answer is templating, a key part of abstracting the underlying syntax from the administrator, and a key part of making log paths far easier to create.
Syslog-ng syntax is very strict, and while it is close to a full programming language, it is missing some key constructs, in particular the ability to interact with the running environment and adapt its configuration based on conditional testing of environment variables. When syslog-ng is instantiated, the configuration must be solidified at runtime. Therefore, SC4S needs a mechanism to allow SC4S to dynamically build this fixed configuration just prior to the launch of syslog-ng. Enter “gomplate” or “go templates”.
The templating process allows for environment variables to dictate the final syslog-ng configuration used by SC4S. Consider this environment variable:
SC4S_SOURCE_UDP_SO_RCVBUFF=33554432
How does the value of this variable work its way into the final config? The key is templating; the syslog-ng source config inside the container is not hard coded, but instead looks like this for the UDP receive buffer:
so-rcvbuf({{getenv "SC4S_SOURCE_UDP_SO_RCVBUFF" "1703936"}})
Everything inside the double curly brace pairs is part of the template (which itself is its own language) and is used to conditionally insert configuration elements based on environment variable settings. The result is the following final configuration that replaces the default value (1703936) with the value from the variable:
so-rcvbuf(33554432)
The example above is a simple substitution; indeed, more complex conditional replacements can also be made:
{{- if or (conv.ToBool (getenv "SC4S_ARCHIVE_GLOBAL" "no")) (conv.ToBool (getenv "SC4S_ARCHIVE_CISCO_ASA" "no")) }} destination(d_archive); {{- end}}
This construct inserts the alternate archive destination into the configuration if either of the ARCHIVE variables are set to “yes”. If neither of the variables are set at all in the env_file, or are both set to “no”, the text specified between the “if/end” conditional is not inserted into the configuration.
We will now turn our attention to the process of creating a log path, which makes heavy use of the templating process described above. But there are a few items we must take care of prior to writing the log path which will aid us in the process. A critical step — after determining that a log path is indeed necessary by checking the “Prerequisite Tasks” at the beginning of this section — is obtaining a suitable raw event to work with.
You may have experience with collecting raw data samples when configuring SC4S, particularly if events land in Splunk with the wrong metadata (sourcetype, etc.). This task is critical for log path development and should be the first technical step taken. A number of options exist for this; the two most common being tcpdump and using SC4S itself. The details are documented here, and will be briefly summarized below and will be reviewed in our walkthrough:
We now have the data we need to walk through an example log path. Let's dive in!
There are two types of log paths in SC4S: "Simple" and "Traditional". Simple log paths can be used when the device is capable of sending on a unique port and minimal, protocol-only parsing is sufficient to determine a single sourcetype for the event. These can be configured entirely via environment variables, and do not require the development of a dedicated log path. More details on simple log paths can be found here.
On the other hand, if the device family requires multiple sourcetypes (e.g. Palo Alto), a traditional log path with more comprehensive parsing must be developed. As part of our walkthrough, we will determine if our new device can be supported with a simple log path, or whether a traditional one needs to be developed.
We will use the Stealthbits StealthINTERCEPT product as an example for our new log path. The configuration of the App and the device for syslog operation is typical in that a raw, "on the wire" sample is not provided. Therefore, tcpdump or SC4S must be used to obtain a raw sample as outlined above.
The following steps for creating a log path that will support the StealthINTERCEPT device family will be outlined below:
Let's begin by looking at the raw sample in Splunk (either by listening to a real device or using the "echo" command outlined above):
Send the event to SC4S (edited for brevity):
echo "<37>`date +\"%b %d %H:%M:%S\"`"'.986 stealth-host StealthINTERCEPT - Authentication Auth failed - PolicyName="StealthDEFEND for AD" Domain="TDOMAIN" Server="TDOMAIN-DC" ServerAddress="10.2.8.55" Perpetrator="MarkB" ClientHost="AP34.TEST.COM" ClientAddress="10.2.8.55" TargetHost="TDOMAN-DC.TEST.COM" TargetHostIP="10.135.33.7"' > /dev/udp/sc4s.test.com/514
Here is the event in Splunk; you can see that RAWMSG is turned on:
A couple of things stand out here:
Now that we know that we need to develop a traditional log path, where do we start? Recall the directory structure outlined above. You'll see the directory
/opt/sc4s/local/config/log-paths
There, you will see two files:
lp-example.conf lp-example.conf.tmpl
The log-path directory (along with the others in the config directory) are all "live" syslog-ng configuration files and are included in the overall configuration when the container (and underlying syslog-ng process) are run. Therefore, the syntax must be perfect or the whole affair will fail to start. For this reason, many of the .conf files are not edited directly, but rather through their .tmpl variants. In addition to variable substitution and conditional substitutions discussed in Part 3, the templating process allows us to abstract complex parts of the log path that need not be exposed (and configured) by the administrator.
Let's take a look at the lp-example.conf.tmpl file. Don't worry if the specific example file in your particular SC4S release differs from the one used for these screenshots. The example file periodically changes as SC4S is enhanced and refined, and will have indeed undergone changes to simplify the structure as you read this. Focus on the overall structure while reviewing the steps below; they will remain consistent regardless of the specifics of the file.
You will see that, first of all, the file is overly commented for instruction. The number of executable lines that result will actually be quite small. Second, though you will unlikely have a context-aware text editor (e.g. Sublime in "C" language mode) on your SC4S server, it helps to use one off-box when initially creating your template file from the example file above for help with syntax checking.
Let's walk through the steps needed to convert this "example" file into a log path that works with a real device, in this case our "Stealthbits" example. The following steps will prepare the new log path for customization specific to the device:
After these string replacements, we will now look at each section in turn. Careful examination of the snippets in the sections below will show the string replacements compared to the full screenshot of the unaltered example file above.
In accordance with the outline at the beginning of this section, we will dissect this file into four main sections:
Let's start with the "Source" portion of the log path. Starting at line 28, we see:
These two templated lines do a most of the work for you, courtesy of the templating process. Just the main STEALTHBITS_INTERCEPT string, which forms the "root" of environment variables used to set unique listening ports, and a "parser" value (set to match the high-level structure of events. If you don't know, use "common") is all that is needed to create a custom source declaration for your device. Keep in mind this section will create the source declaration – i.e. the function that is called from within the log path as we'll see below.
Next, we will explore the beginning of the log path itself, starting at line 32 (line 34 truncated):
You will see that each event will take two parallel pathways through a filter "junction", which then "merge" after all "channel" elements are traversed. In the top channel, the newly created source called s_STEALTHBITS_INTERCEPT is checked. If any events arrive on that source (unique port), it is passed through to the remainder of the log path with no further filtering. If, on the other hand, the event arrives over the default s_DEFAULT port (typically UDP or TCP 514) you can see that there is an additional filter (f_stealthbits_intercept) that must also match, or the event will not be "allowed in" to this log path. This filter can be declared (just like the custom source) immediately above the log{} stanza in the log path file, or it can be included in a separate file as shown below. Like the source, it is just a function with the name f_stealthbits_intercept:
Take a look at lines 2 and 3. Remember the raw message screenshot in Splunk (above)? Look for the field PROGRAM. This is an example of how the initial parsing pass of syslog-ng can be extremely useful for building filters in log paths, and lines 2 and 3 show how this field ("macro" in syslog-ng parlance) is checked to see if it matches the two values shown. You'll also see in the following section below how the same macro can be very useful when assigning sourcetypes – which the TA will expect.
In this phase of the development, the full complement of the syslog-ng config programming language can be brought to bear. While you can extensively parse the full event payload and even go as far as complete field extraction a la Splunk itself, it is best to limit the parsing to just the Splunk metadata that will need to be sent along with the event to be indexed. This metadata includes the normal index, time, host, source, and sourcetype. Note that time is included in this list; we want to ensure this is properly parsed before it gets to Splunk, as timestamp processing is bypassed (by default) with the /event HEC endpoint used by SC4S.
Here is the "guts" of the log path, where this metadata assignment is done:
Several rewrite functions are made available to the developer as shown, so even this section can be "plug and play" for most log paths. The defaults for all Splunk metadata are set using the rewrites on lines 56 and 61 – and which rewrite is used is dependent on the value of the macro PROGRAM. Again, the initial syslog-ng parsing has been put to good use here. You can see that the sourcetype is set when these functions are called, but none of the other metadata is. This is because the other metadata (host, time, and source) are typically set at ingest time (in the source declaration), and do not need to be specifically set (or overridden) here.
But what about the index? We typically don't want to default that, so where is that set? It is done on line 66 – which is the parser that consults the splunk_metadata.csv file discussed in Part 3. The sole argument passed in that function is the key that the developer assigns (again, using the vendor_product convention). Similarly, the compliance_meta_by_source.* files are referenced in the parser called on line 70, and is the last lookup consulted before the event is sent out to one or more destinations, described next.
After having all variables set (including several indexed fields derived from the initial syslog-ng parsing), the event is ready to be sent to one or more destinations. These destinations are heavily controlled by environment variables, which in turn means several templating constructs. This section of the file is shown below:
The final step in preparation for sending out is the setting of the output template, shown on line 75. An appropriate default template (based largely on what the TA expects) is chosen. These templates are all documented and are constructed from the various syslog-ng macros (PROGRAM, MESSAGE, etc.). This default can be overridden via splunk_metadata.csv if desired.
Finally, in lines 81 through 102, environment variables are consulted and appropriate destinations are added to the final config file. The log path then ends with two flags – one to tell syslog-ng to flow-control TCP traffic if necessary (UDP cannot be flow controlled), and the other to cause the event to not enter any other log path, but instead to terminate further processing.
So what does this all look like when the template file is passed through the gomplate templating engine? Here is the output after template processing of the example above, with the following variables
SC4S_LISTEN_STEALTHBITS_INTERCEPT_TCP_PORT=5015 SC4S_DEST_GLOBAL_ALTERNATES=d_hec_debug
set in the env_file:
First off, you will see all the "curly brace" gomplate code is now gone throughout. The file starts off with the source declaration (which is only 3 lines of "gomplate" code), and has expanded to several lines (11-48) in the final output. The source declaration handles much of the initial metadata and preparation of indexed fields, as well as the setup of the listening socket on TCP port 5015 based on the env_file setting. The filtering and parsing sections (lines 50-87) pass through the templating engine relatively unchanged, while the destination section (which was several lines of gomplate code) is effectively reduced to 3 lines of code in the final output (lines 91-93) and includes the d_hec_debug destination, again as a result of the env_file setting.
Here is the final result as it appears in Splunk. Note that the output format is no longer JSON (as is the case with "fallback" events) but is simply the original event minus the header (<PRI> string, host, and timestamp). This is what most TAs expect.
Here are some tips that are helpful during development:
We realized the above is a "whirlwind tour" and many details were glossed over, particularly the nuances of the syslog-ng configuration syntax itself. The community is here to help you with any questions or design challenges you may have! Good luck!
Splunk Connect for Syslog is fully Splunk supported and is released as Open Source. We hope to drive a thriving community that will help with feedback, enhancement ideas, communication, and especially log path (filter) creation! We encourage active participation via the git repos, where formal request for feature (especially log path/filters) inclusion, bug tracking, etc. can be conducted. Over time, we envision far less “local filter” activity as more and more of the community’s efforts are encapsulate in the containers OOTB configs.
There are many resources available to enable your success with SC4S! In addition to the main repo and documentation, there are many other resources available:
We wish you the best of success with SC4S. Get involved, try it out, ask questions, contribute new data sources, and make new friends!
----------------------------------------------------
Thanks!
Mark Bonsack
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.