This blog post is part 26 of the “Hunting with Splunk: The Basics” series, which takes a single Splunk search command or hunting concept and breaks it down to its basic parts.
If you’re like me, you’ve occasionally found yourself staring at the Splunk search bar trying to decide how best to analyze a series of data, iterating against one or more fields.
If your brain gravitates towards traditional programming syntax, the first thing that pops into your mind may be application of a for or while loop (neither of which follow Turing convention in SPL). With commands like stats, streamstats, eventstats, or foreach at your disposal, which one should a hunter use?
Well, it depends on the data and the required outcome. For example, let’s say we want to calculate the total distance travelled by a salesperson or an escaped toad. The data may contain waypoint information that requires iterative calculation, such as latitude and longitude (or, in some cases, this enrichment may be extracted from the source data, such as with the iplocation command).
Enter autoregress. Sounds fancy. But here’s the thing, the autoregression command is used to calculate a moving average. Here is a link to the Splunk docs description of the autoregress command. Go ahead and check it out, we’ll wait.
Finished? Awesome. Let’s talk about practical applications.
Because the autoregress command is a centralized streaming command, it applies a transformation to each event returned by a search and only works on the search head.
You might be saying to yourself, “Self, I’ve never heard of this command before.” Well, you’re not alone. It’s not new, but not particularly well known. Kyle Smith of Aplura, LLC, included autoregress in his .conf2016 talk, “Lesser Known Search Commands”. Unlike iterative commands, such as map or foreach, the autoregress command is a statistical command (in the same family as the widely used stats and tstats commands).
Kyle expands on the definition as “a Moving Average is a succession of averages calculated from successive events (typically of constant size and overlapping) of a series of values“ and notes the following:
Let’s say we’re planning a road trip to visit some of the top craft breweries in the Mid Atlantic United States, and fed that data into Splunk. We want to compute the distance between waypoints and the total distance we’re traveling (so we know how much fuel to put into our personal jetpack). We apply autoregress to both latitude and longitude in order to iterate through the waypoints, then perform any further applicable calculations, such as `globedistance()` or streamstats.
Once you’ve pulled the relevant fields, your command may look something like this:
… | autoregress lat as prev_lat | autoregress lon as prev_lon | `globedistance(lat,lon,prev_lat,prev_lon,units)` | streamstats sum(distance) AS totaldistance
Here’s an example:
As shown above, the autoregress command may help you gather the information where commands like stats, streamstats, eventstats, or foreach alone aren’t necessarily suitable. If you’re like me, you should have no regrets adding the autoregress command to your SPL utility belt.
We invite you to join us for the Sixth Annual Boss of the SOC premiering at .conf21, where you’ll have the chance to buckle up and flex your Splunk super powers.
Happy hunting!
Follow all the conversations coming out of #splunkconf21!
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.