Splunk Enterprise 6.3 added the feature of Schedule Windows that allows the Search Scheduler to distinguish between searches that really should run at a specific time (just like cron) from those that don't have to, thereby greatly reducing lag or skipping. Splunk Enterprise 6.6 adds Schedule Skewing that allows the Search Scheduler to randomly distribute scheduled searches more evenly over their periods.
What’s the practical difference? When should you use one vs. the other? I’ll explain. But first, I’ll review each feature separately.
There are a few terms used when discussing the Search Scheduler:
Background
As mentioned, schedule windows allow the Search Scheduler to distinguish between searches that really should run at a specific time (e.g. every hour as close to the top of the hour as possible) from those that don't have to (e.g. approximately every hour, but when specifically within the hour is not critical). Hence, giving a search a window is altruistic: it helps other searches. In savedsearches.conf, the parameter is specified as:
schedule_window = window-in-minutes | auto
where window-in-minutes when greater than 0 indicates the specific window of time during which the search will be altruistic—i.e. have a priority score higher (worse), and allow other searches to run first. (However, if at any time there is sufficient capacity to run the search, it will be run). If the search instance hasn’t run and the window expires, then the scheduler will treat the search instance from that point on as if it never had a schedule window (until it either finally runs or is skipped).
The auto value tells the scheduler to calculate the window of time automatically based on historical run-times of the search. For example, if a search runs every five minutes and has historically taken approximately twenty seconds to run, then—in order to have been run within its five-minute period—the search can be deferred at most four minutes and forty seconds; so that is the auto value.
To illustrate a use-case for schedule windows, suppose you have a mixture of searches: some run frequently—say every 5 minutes or even every minute—and some run less frequently—say once an hour or even once a day. At times, when many of those searches scheduled times align on a Splunk deployment with insufficient capacity to run them all concurrently, those searches with schedule window will allow other more important searches to run first.
Before & After
To illustrate the benefit of schedule windows, here are some “before” vs. “after” scheduler performance charts.
Things to notice:
Here is the “after” set of schedule performance charts.
Things to notice:
Background
As mentioned, schedule skewing allows the Search Scheduler to randomly “skew” a set of searches’ scheduled times more evenly over their periods. In savedsearches.conf, the parameter is specified as:
allow_skew = percentage% | duration
where:
To illustrate a use-case for skewing, suppose you have very many searches that run for only a few seconds every minute. Despite having very many searches, your Splunk deployment can run all the searches simultaneously. However, the simultaneous network bandwidth used by those searches exceeds the capacity of your switches; and, just a few seconds later when all the searches have completed, the network bandwidth drops back close to zero. Since your Splunk deployment can run all the searches simultaneously, this isn't a problem that scheduler windows can solve. What you want is to spread the dispatching of the searches out over time to decrease the network saturation. This is precisely what skewing does.
A few things to note about skewing are:
Before & After
To illustrate the benefit of schedule skewing, here is one “before” vs. “after” scheduler performance chart.
The thing to notice is that the majority of searches are running every minute at the top of the minute saturating the network.
Here is the “after” schedule performance chart.
The thing to notice is that the searches are now much more evenly spread over time thus reducing the network load.
Now that each feature has been explained, when should you use one versus the other?
Want to learn more? Check out the slides and recording of my .conf2017 session "Making the Most of the Splunk Scheduler."
----------------------------------------------------
Thanks!
Paul Lucas
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.