Important Update as of 6/5/2020: Splunk has released Splunk Connect for Syslog (SC4S) and solution for syslog data sources. More information can be found in our blog post, here.
As I mentioned in part one of this blog, I managed a sizable deployment of Splunk/Syslog servers (2.5TB/day). I had 8 syslog-ng engines in 3 geographically separate data centers. Hong Kong, London and St. Louis. Each group of syslog-ng servers was load balanced with F5. Each group was sending traffic to their own regional indexers. Some of the syslog servers processed upward of 40,000 EPS (bursts traffic). The recommendation that I am about to describe here is what worked for me; your mileage may vary of course. I tried optimizing the syslog-ng engines to get as much performance as possible out of them. If you feel, however, that it is over kill or if you don’t have the manpower to go through the tuning process; it maybe easier to just add additional hardware and use the default settings.
With syslog-ng release 3.x a new feature was introduce that allows you to dynamically include configuration files in the body of the main syslog-ng.conf. This is similar to C language “include” or Python “import”.
To use this feature just add a line like this to syslog-ng.conf
@include "/etc/syslog-ng/buckets.d"
This feature enables you to create a main syslog-ng.conf file then move all source-related configurations to a directory (let’s call it buckets.d). By doing so you have effectively split your syslog-ng configuration into two parts: The Static part which contains the syslog-ng server specific configuration (i.e. IP address, listening ports, sockets conditioning…etc.); and the Dynamic part, which is related to the source devices (hostnames, permissions, locations, filter rules…etc.). The static part does not change from server to server. Once configured you probably don’t need to change it. The dynamic part (buckets.d files) is constantly changing every time you add or remove a source host.
Sample buckets.d filter-file:
destination d_firewalls { file ("/syslog/FIREWALLS/$SOURCEIP/$SOURCEIP.log" owner(syslog-ng) group(splunk) perm(0755) dir_perm(0755) create_dirs(yes) }; filter f_firewalls { match("%ASA-" value ("MSG")) or match("%ASA-" value ("MSGHDR")) or match("%FWSM-" value ("MSG")) or match("%FWSM-" value ("MSGHDR")) or match("%PIX" value ("MSG")) or match("%PIX-" value ("MSGHDR")) and not netmask("10.96.50.13/32”) ;}; log {source(s_network); filter(f_firewalls); destination(d_firewalls); };
To make syslog-ng configuration modular, create as many filter-files as you want. Each filter-file should contain a list of individual group of sources. Then periodically sync the “buckets.d” directory across all of your syslog-ng servers.
My sources devices (dynamic part) are not the same across all these data centers. So why am I syncing filter-files I wouldn’t use, you ask? Good question. The answer is ease of administration. By synchronizing buckets.d you don’t need to worry about which source lives where. My intention was to create a universal set of filter-files that will work in any data center. The simplicity of management superseded the clutter in this case. From that point on, every time you restart syslog-ng the entire contents for buckets.d along with the main syslog-ng conf will appear as one single conf file for the syslog-ng daemon.
Keywords naming convention:
As with Hungarian notation https://en.wikipedia.org/wiki/Hungarian_notation , I strongly recommend using the following naming convention to make your configuration easy to read and follow:
d_ for destination
f_ for Filters
s_ for Sources
destination d_damballa {file ("/syslog/DAMBALLA/$SOURCEIP/$SOURCEIP.log" ); }; filter f_damballa { netmask ("10.63.1.1/32") ;}; log {source(s_network); filter(f_damballa); destination(d_damballa); };
Turn on statistical gathering:
Turning on statistical gathering in syslog-ng. It will enable you to have visibility to the engine’s operation. You will be able to see how many events per source are being collected. This information is critical for capacity planning and performance tuning.
destination d_logstats { file("/home/syslog-ng/logstats/logstats.log" owner(syslog-ng) group(splunk) perm(0644) dir_perm(0750) create_dirs(yes));}; filter f_logstats { match("Log statistics;" value ("MSGHDR")) and match("d_windows" value ("MSGHDR")); }; log { source(s_local ); filter (f_logstats); destination(d_logstats); };
Watch for file permissions:
Make sure syslog-ng process can READ buckets.d directory and can READ/WRITE the logs directories. Make sure that splunkd daemons has full READ access to the log files (and their parent directories)
file("/syslog/MSSQL/$SOURCEIP/$SOURCEIP.log" owner(syslog-ng) group(splunk) perm(0755) dir_perm(0755) create_dirs(yes));};
Watch for UDP packets drops on the syslog server:
Many new syslog-ng admins don’t pay attention to his item. They simply assume things will work. For the most part that is true, but in a high volume environment UDP traffic drops are unavoidable. You will start to hear some users complaining about “missing events” and they will probably blame Splunk for it. So do yourself a favor and monitor UDP packet drops on the interface. Use whatever tool you are comfortable with. And yes, there is a Splunk app for that https://splunkbase.splunk.com/app/2975/#/overview
Syslog-ng has several tuning parameters to achieve higher “ingestion” or “capture” rates. Please be aware that if you use large values; some of these configurations may require adjustment to your kernel (/etc/sysctl.conf). For full details on all syslog-ng available options please consult https://www.balabit.com/sites/default/files/documents/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html/index.html?_ga=1.115117060.204896635.1456724547
Here are some configuration options that you can use:
Set the receiving buffer size
You can control the size of the receiving buffer using rcvbuf(). Incoming events will be queued in memory before they are written to disk. While large buffer will improve the capture speed it may also result in undesirable side effect of timestamp skewing. Syslog-ng timestamps events when they are written to disk and not when the network card does receive them. In my environment I managed to get over 5 minutes timestamp skewing by just creating too large of a buffer. If your events have multiple timestamps (one added by syslog-ng and the one added by the source device), you can probably instruct Splunk to use the second timestamp. Again use with caution!
udp ( ip(10.16.128.93) port(2514) so_rcvbuf (805306368) so_sndbuf(8096) time_zone(GMT) keep_timestamp(no) );
Use multiple sockets:
The term “network socket” refers to the combination of port number and IP address. One-way of enhancing syslog performance is splitting your incoming log traffic to multiple sockets (or channels). For example in my environment I configured syslog-ng to listen on UDP/2514 for the firewalls, and TCP/2515 for VMware logs and so on. You can also utilize multiple IPs if you have them. The idea here is to distribute the load among multiple channels. However, before you rush into opening multiple sockets, make sure you have exhausted the existing one(s). There is no need to complicate your design just because you can. Simple is always elegant!
Allow TCP logging:
Many network devices can only be configured to use UDP/514 for logging. But try to enable TCP logging whenever possible. The advantage is reliability of the transport protocol.
tcp ( ip(10.16.128.93) port(2514) ) ;
Set max-connections the socket can handle:
The objective here is to prevent a single source that has “gone wild” from overwhelming the channel. Start with a large number then tuned down based on your environment’s “normal” activity.
tcp ( ip(10.16.128.93) port(514) max-connections(5000) ) ;
Turn off DNS name resolution:
Unless you really need it, I recommend filtering by IP address and not attempt to lookup DNS hostnames. However, if you must have it; then look into running DNS cache-only servers. http://www.tecmint.com/install-caching-only-dns-server-in-centos/ . Please remember that syslog-ng can do DNS caching of its own, so again do not rush to enabling DNS caching in the OS or syslog-ng unless you really need it. In my experience enabling DNS caching in syslog-ng.conf is sufficient to reduce DNS traffic (out of the server).
use_dns(no); dns_cache(no);
Explicitly define logging templates:
The advantage is that you will be able to control how the log message is formatted in case you need to forward the events to other syslogs or third party tool. Syslog-ng engine, much like Splunk, can act as logs router (aka syslog HUB). There are few things you need to worry about when you configure your syslog-ng as a hub. Watch for chain_hostname() and keep_hostname()
template t_default { template("${DATE} ${HOST} ${MSGHDR}${MSG}\n");
Fast Filtering:
Creatinging filters by IP addresses rather than filtering by keywords in the message body (MSG) or header (MSGHDR) is faster. If you’re trying to achieve higher capturing capacity you should look into this option.
Having said this, there might be a need to use keywords filtering. In my case I had an environment with 500+ firewalls. It was very easy to identify all possible unique Cisco ASA keywords found in ASA logs. The alternative would have been listing every single IP in the configuration file. I opted for ease of management this round. Additionally, avoid using regex in your syslog-ng, Splunk is better suited for this task
There are many ways to test your configurations. The best way is to use a traffic generator like IXIA since you can really push massive amounts of traffic. My next favorite tool is loggen by Balabit (which is part of syslog-ng distribution). With this tool you can stress test your syslog-ng install by specifying the rate of syslog messages. You can also test using TCP or UDP protocols. For more realistic simulated test, you can utilize a sample input file (ex: Cisco ASA log). Using real world sample file is also very useful for testing your filtering rules.
From Balabit documentations: When loggen finishes sending the messages, it displays the following statistics:
Example loggen commands:
You can send data to your syslog using input file to have more realistic data
loggen 10.0.0.1 514 cisco_asa.log
The following command generates 1000 messages per second for ten minutes, and sends them to port TCP/514 on host 10.0.0.1 . Each message is 500 bytes long.
loggen --size 500 --rate 1000 --interval 600 10.0.0.1 514
In conclusion, as you can see syslog-ng is a very flexible and well designed open source tool. It can be a critical part of your Splunk deployment. You need to decide how far you want to take it. Weigh all your options, as every environment is different. Seek simplicity as much as possible, but don’t shy away from being on the bleeding edge if it makes sense. And finally don’t assume anything, test your configuration and monitor your deployment. As always I welcome your comments and feedback!
Back to part 1: http://blogs.splunk.com/2016/05/05/high-performance-syslogging-for-splunk-using-syslog-ng-part-1/
----------------------------------------------------
Thanks!
Mohamad Hassan
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.