In Part 1 and Part 2 of this series, we explored the design philosophy behind Splunk Connect for Syslog (SC4S), the goals of the design, and the new HEC-based transport architecture, as well as the rudiments of high-level configuration. We'll now turn our attention to the specifics of SC4S configuration, including a review of the local (mounted) file system layout and the areas in which you'll be working.
Configuration of SC4S occurs at five main levels, outlined below:
Configuration Level |
Files |
Typical Use |
Global and Required Settings |
env_file |
|
Splunk Metadata |
splunk_metadata.csv* |
|
Vendor/Product Event Categorization |
vendor_product_by_source.* |
|
Compliance Overrides Event sub-filters |
compliance_meta_by_source.* |
|
Platform Extension (Log Paths) |
lp-<vendor>.conf.tmpl d_<destination>.conf |
|
We will now look at each of these in turn, but before we do, let’s take a look at the SC4S Container architecture and configuration directory in detail.
The container is configured to mount a local set of directories so that configuration will remain persistent between SC4S restarts. This directory has the following structure:
Directories and files shown in red are part of a standard SC4S configuration and cover the required components such as the URL and Token for the Splunk HEC endpoint (which is the default destination for SC4S traffic) as well as configuration for Splunk metadata (index, etc.). Files contained in the blue directories contain configurations for locally configured collection methods (sources), output destinations and, most commonly, log paths (filters) for data sources that SC4S does not support out of the box. Creation of log paths will be covered in Part 4 of this blog series.
For the rest of this installment, we will focus on the files and directories in red above. Let’s start with the most common (and required) configuration, setting environment variables in the the env_file.
Configuration of SC4S begins with setting required and common optional parameters required for basic SC4S operation, which is done via the env_file. The file contains, as its name implies, environment variables that are used by the container shell at startup, and later consulted by the templating process (described in detail in Part 4 of this blog series) to create the final syslog-ng configuration that drives SC4S. Here is an example of a simple, but in many cases totally sufficient, env_file.
You will see that most of these variables make sense simply by their name. The variables here range from the required (Splunk HEC URL and token) to typical optional variables that are outlined in the documentation. The optional ones specified above are commonly used, however, and merit some explanation:
This list is by no means complete; the documentation covers the full complement of available variables that can be set in this file.
The env_file configuration alone will suffice for many deployments, but there are three key areas that file does not cover:
The context directory contains lookups and syslog-ng configuration stubs that allow all of these customizations to be made.
We will now dive into this section with a few examples, starting with the file that most will require due to unique indexing needs for most enterprises – the splunk_metadata.csv* files, typically located in the /opt/sc4s/local/context directory (diagram). The purpose of these files is to assign Splunk metadata appropriately based on a key (first column) that is set in the log path for that data source. These files work similarly to the way the “default” and “local” conf file directories do in Splunk. The “example” file is analogous to the default conf file in Splunk and should not be edited directly. Indeed, the internal version of the example version of this file is copied to the local directory purely for reference. Here is a portion of that file:
You can scan through this file to determine what index (and other metadata) will be set by default in SC4S (these entries are also documented). The format of this file is:
vendor_product, metadata, value
where vendor_product is an arbitrary key specified by the author of the log path using the convention of the vendor (e.g. cisco) and product (e.g. asa) separated by an underscore. In most cases the key is lower case; the exceptions are CEF and LEEF sources where these values are derived from the event and are not an arbitrary selection by the log path author. For all SC4S-supported data sources, the keys (and index/sourcetype defaults) are documented in the source section of the documentation. The full list of available overrides is available here and includes alternate output templates (which govern how an event will appear in Splunk) as well as traditional metadata. Only one metadata item can be set per row; if more than one metadata item for the same source needs to be set or overridden, the same key can be used in multiple rows.
If a particular metadata item needs to be overridden (most commonly the index), the splunk_metadata.csv file (without the example extension) can be edited to override one or more entries. The local (non-example) file can also be used to specify default metadata for new log paths (we will see this in Part 4 of the blog series):
Similar to local .conf files in Splunk, this one is much shorter than the default. It simply shows an override of the Cisco ASA index, as well as a new entry for the “StealthINTERCEPT” device that does not exist in the example file at all (as it is not supplied OOTB with SC4S). This last entry is for our new log path that we’ll cover in Part 4.
When events arrive into SC4S, they must be categorized properly so that the proper sourcetype and other Splunk metadata can be applied. If an event is captured over its own unique TCP or UDP port, that categorization is easy. More often, however, the event must be captured over the default UDP port 514, which is historically reserved for syslog traffic and will undoubtedly receive events from many different device types. In this case the contents of the event (payload) will be examined to route the event to the proper log path, which further parses the event and assigns proper metadata. While this works for a majority of device types, there is a small but significant set of devices for which the payload is not unique enough to categorize it with certainty.
To account for this, SC4S provides a means to categorize (filter) events based on sending hostname or IP/CIDR address, well before any payload examination is done. The filter relies on the fact that one characteristic of an incoming event is known with certainty (the source IP address) and another with near certainty (the hostname) prior to deep payload inspection. When the filter (rule) “fires” on an event with a specific hostname and/or source IP address, a lookup file is consulted to set an internal variable to a string unique to the vendor and product type. This variable is in turn used as a (separate) filter in a log path created for that specific vendor/product combination, ensuring the event is routed to the correct log path regardless of the payload contents. We will examine the link between the context filter, the lookup file, and the log path filter in an example below.
First, a caution: this level of customization requires some rudimentary understanding of syslog-ng syntax for the first time, as the rules themselves and the lookup they consult are “live” syslog-ng configurations which are exposed in the local directory tree. If these files are misconfigured, the syslog-ng startup parser will flag the errors, note them at startup, and abort. But fear not, there are well-populated example files that will serve as a template for most use cases, and we will walk through them now.
Let’s dig in. The primary context files are located in /opt/sc4s/local/context, and consist of two files:
vendor_product_by_source.conf
vendor_product_by_source.csv
The use of these files differs from the typical Splunk “default/local” convention that is used for splunk_metadata.csv. To properly use these files requires that they be complete (in other words, the “example” files are not consulted at all). Therefore, it is best to copy the example files to those with the regular extensions before you start using/editing them.
Here is the vendor_product_by_source.conf file:
You will see that it is a series of syslog-ng filters than can filter on any syslog-ng “macro” (field), but host and netmask are the only ones used because (as discussed earlier) these are the only fields which can be used with certainty before the payload is processed further. These filters (rules) are checked very early in the data path through SC4S, just after the rudimentary syslog parsers are run and before any payload filtering.
In this example, highlighted at line 11 is a filter for Juniper Netscreen devices with hostnames of jpns-* (the “glob” type indicates that this is a wildcard, and not a regex). The netmask in this case is commented out, but if it is known it can be easily applied. You will see that this file gives no indication as to what categorization should be applied — this is where the second (lookup) file, vendor_product_by_source.csv, comes in:
This lookup has the same 3-column structure as the splunk_metadata.csv file used for Splunk metadata, but the fields and values set are quite different and do not directly reference any Splunk metadata. The secret to these files is how the lookup key is used: you will see that the filter name in the conf file (line 11) will match one or more keys in the csv lookup (line 7). When the filter in the conf file “fires”, this link will set an internal field called fields.sc4s_vendor_product. This variable is in turn used in a simple filter that checks for the value of that variable in a log path created for that specific vendor/product combo, in this case Juniper Netscreen:
You may ask, why not just use the original filter in the main log path (shown immediately above) rather than the indirect approach using an internal field? The reason is to allow the administrator to set hostname/CIDR filters using a minimal amount of (repeatable) syslog-ng code, and more importantly to not require access to the main log path, which is inaccessible inside the container. In this case, line 43 in the main log path (hidden inside the container) never needs to change, but instead is driven by the filters and lookups in the vendor_product_by_source.* files.
There is a second set of context files that are configured similarly to the primary context filters described above. The primary difference is where the filter operation takes place — deep inside a log path, long after the vendor/product combination has been determined. It allows the administrator to configure (again, primarily based on hostname/CIDR block but can also include other syslog-ng macros (fields) that might have been created internally in the log path itself. These overrides (sub-filters) can be very useful in compliance use cases, where a subset of a given device type (sourcetype) must be tagged or redirected.
Again, there are two files in the same /opt/sc4s/local/context/ directory:
compliance_meta_by_source.conf
compliance_meta_by_source.csv
As with the first set of vendor/product context files discussed above, it is best to copy over the “example” versions first before you start. This set of files typically is far shorter than the first set as well, so it may be preferable to simply use the example files as a reference.
Here are the two compliance_meta_by_source files:
You will indeed see that this is dummy data, and also note one other interesting characteristic: the second column in the csv file is not limited to Splunk metadata but can be set to any syslog-ng macro. In fact, it can be used to set an arbitrary indexed field=value pair (e.g. compliance=pci) that can be sent to Splunk, which can be very useful for compliance needs. As part of this flexibility, however, Splunk metadata entries need to have the internally-used .splunk. prefix applied (e.g. .splunk.index) in order to override the index. These details are covered in the configuration section of the documentation.
Having now completed all configuration steps for existing data sources, we will now turn to our final task in the next installment (Part 4). We will cover the process of extending the SC4S platform with an exploration of the templating process, and the creation of new “log paths” to include data sources not supported “out of the box.” This will be the deepest level of SC4S customization; though syslog-ng experience is helpful, the templating process makes this exercise far easier as you will see!
Splunk Connect for Syslog is fully Splunk supported and is released as Open Source. We hope to drive a thriving community that will help with feedback, enhancement ideas, communication, and especially log path (filter) creation! We encourage active participation via the git repos, where formal request for feature (especially log path/filters) inclusion, bug tracking, etc. can be conducted. Over time, we envision far less “local filter” activity as more and more of the community’s efforts are encapsulate in the containers OOTB configs.
There are many resources available to enable your success with SC4S! In addition to the main repo and documentation, there are many other resources available:
We wish you the best of success with SC4S. Get involved, try it out, ask questions, contribute new data sources, and make new friends!
----------------------------------------------------
Thanks!
Mark Bonsack
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.