Here at Splunk, we often talk about best practices to create log events regardless if they are written to file, a network port, or come from the standard output of some program. Since this has been discussed before, I won’t enumerate these practices here, but I will allude to them for the purposes of this topic. Furthermore, adding useful information to your generated log events so they can be used for multiple contexts, a concept in line with semantic logging, discussed in this video by Rob Das, compliments the best practices. This is great for log events (or time series events as I use this phrase interchangeably with log events) that you yourself can generate, but what about log events that are generated by third parties where you have little control for their content?
I had a recent conversation with the Splunk Ninja, Michael Wilde, and Michael mentioned that it would be good if companies that utilize the time series data that is produced by their vendors be able to tell these vendors to follow the same logging best practices that they themselves try to practice. In other words, tell your vendor about how to improve their logging practices if you think they should be improved. We no longer live in a world of antiquated traditions from the last century where only developers and specialized support staff were expected to understand log events.
In today’s world, the log events produced by vendor products not only serve as a first line of support for understanding the behavior of the product, but in some cases they provide meaning for deeper business understanding of the situation at hand. For instance, if a firewall vendor publishes deny events for a set of source IP addresses that are larger in magnitude than usual, the business may not only conclude there may be a denial of service attack at hand, but they can also use these IP addresses to shut down access from them in further perimeter gateways. Thus, improving the readability, KPI potential, indexing ease, and gathering capability for vendor produced logs is in your best interest.
It doesn’t stop there. Improved log formats also benefit the vendors themselves. By having to handle less support calls because problems were discovered with suggestive root causes in the logs, vendors can easily recoup their investment in making the logs events more usable. Moreover, if more customers start using the business context of the enriched log events, the vendors benefit with increased customer satisfaction and overall increase in the use of their products.
There are practices in creating log events that I think just cause problems for customers. Let me name a few.
No Year
Not having a timestamp for something that is inherently a time series event is of course a bad practice, but one that is more subtle is having a timestamp, but refusing to put a 4 digit year in the timestamp. This is a pet peeve of mine. It’s almost assuming that customers only hold onto their log events for less than one year at a time or the event generation part of a vendor system wants to conserve 4 bytes per event. Please put in the year in your timestamp.
Only Binary Events
Some vendors create log vents in binary (non-ASCII) format only. That, in itself, makes it an inconvenience, but the real problem starts when there is no easy way to convert these events to ASCII human readable formats. This goes back to the point I made earlier where the vendor assumes that only the consumer of the log events will be themselves as they are the only ones equipped to troubleshoot or comment on their product’s usage, which may not be true. Please provide the tools to make binary log events readable to other systems as well as human readers.
Multiple Log Formats in the Same File
I saw one application server log that had multiple event formats within the middle of log. This reminds me of the time I saw one company use one file to hold log events from eight applications each using its own format. Fortunately for Splunk, events can be indexed with regular expressions to figure out their sourcetypes and line breaking, but would it not make sense to try to use separate files for each application? In Splunk, you would still have the ability to perform correlations among different sources when they are in separate files.
Cryptic Formats Understood only by the Vendor
A customer told me about an IBM Mainframe log format where the first two bytes mean one thing, the presence of another byte indicates another meaning, the next four bytes mean something else and so on. Obviously, this is not a human friendly format. In the 1960s and 1970s when memory and disk space costs were enormous, squeezing in every byte into the smallest possible space was a requirement. It is now the twenty first century where following this practice for the sake of keeping old traditions alive is not a very good reason to not upgrade event formats.
The Splunk best practices page for log formats state that users should avoid XML and JSON formats because they may take longer to parse for indexing and are not necessarily easily readable. With the creation of the spath command in Splunk 4.3 to extract fields at will for XML and JSON, I may stray slightly from that advice. It’s just that you should try to avoid indexing an XML entry as one monolithic event. For instance, in the Vimeo Channels app, events come back in this format:
<Videos>
<Video>
...
</Video>>
<Video>
...
</Video>
...
</Videos>
If you were to index this one return value from its REST API, you’ll end up with with one large event per call and spend all your time using spath and multi-value command combinations to do your searches. In this particular case, what I ended up doing was sending to standard output nothing at all for the Video and the Videos lines and inserting my own “channel=some name” for each occurrence of Video events. Then, I used Splunk’s line breaking capability to make sure each event would nicely break at “channel=some name“. At that point I had one event per video making it very readable and easy to extract fields dynamically.
Though I may have digressed with some hopefully useful tidbits on bad log formats, I still would like to drive the point home that Michael Wilde was making to me. Customers should ask their vendors to create log formats that would provide value to both themselves and the vendors. There is a chance that they may listen as it is profitable for all involved. If your log format from the vendor changes and you have been using Splunk to analyze the data, you won’t have to worry too much about that as the use of late binding to extract fields should ensure your searches and apps keep running.
I have not discussed the transport mechanism for getting access to log events, but that can be tabled for a future discussion. In closing, having a world of vendors employ best practices for creating log events will continue to ensure that future generations of Splunkers, such as little Maya below, (and even non-Splunk users) have an easier time ingesting and analyzing machine generated data.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.