Get in Command of Splunk Resources with Workload Management - Part 3

By Shalabh Goyal

In the last installment of this blog series, we discussed how to configure Splunk Workload Management for a complex deployment and how to reduce execution time of high priority searches. In this final blog of the series, I will describe the new workload management related features in Splunk Enterprise 8.0 that was released at .conf19 and also give a usage example.

New Workload Management Features in Splunk Enterprise 8.0

Auto-monitoring and remediation
A common issue that many Splunk customers encounter is a rogue or poorly written search that impacts high priority searches. It is not possible to identify such searches manually and respond in a short amount of time. Hence, we have introduced automatic monitoring and remediation capability in Workload Management. Now, you can create rules to monitor searches for runtime and take a pre-defined remediation action. The rules are evaluated every 10s and if a rule is triggered by a search, the corresponding action is taken. See the example below.

Richer workload rule framework
In Splunk Enterprise 7.3, you may create workload rules using attributes such as index, role, user, and app. We have extended this framework to include search type (adhoc, scheduled, DMA etc.), search mode (historical or realtime) and search time range (alltime). In addition, you can also use index=* as part of workload rule. This rich set of attributes provides you the flexibility to create granular workload rules. See the example below.

Note that the second monitoring rule can be used to throttle searches by allocating only a small amount of CPU resources to the low priority pool. This feature allows hands-free management and reduces the impact and surface area of poorly written searches.

Schedule based rule
Many times you need workload rules to apply only during business hours or peak hours. Now you have the ability to enable workload rules based on time schedule. You can create a certain set of rules for peak hours and another set of rules for off-peak hours. For example, you can create a rule to allow adhoc searches from privileged users to run in high priority pool during business hours.

Usage Example

Now let’s take a simple usage example, where you want to achieve a few objectives through workload management. In this scenario, you have a single search head and an indexer cluster. The objectives that you want to achieve are listed below:

Prioritize and protect ingestion workload
Provide a high priority resource pool for all scheduled searches from security team
Provide a high priority resource pool for all users that are included in ‘privileged’ role
Isolate and throttle all index=* and alltime searches
Abort all real time searches after they run for 1m
Throttle all long running searches (>5m) that are not from ‘privileged’ users or security
Abort all long running searches (>10m) that are not from ‘privileged’ users or security

First, we allocate resources to the Ingest and Search categories on search head and all indexers as shown below. Then, we track the peak CPU usage of Ingest pool on Indexers for the next few days or weeks through Workload Management page on Monitoring Console. We then adjust the CPU limits for all indexers if peak CPU usage goes above initially configured 30% allocation. Note, that in this scenario we have not configured the Misc pool, which is only required if you want to isolate modular and scripted inputs from the rest of the workload.

Next, we configure various search pools as shown below. High priority pool is given the most CPU resources to ensure timely execution of searches. If unused, the CPU is shared with other workloads. The memory is shared across all searches.

Finally, we create the following workload rules to achieve our use cases.

Workload Management allows you to achieve your business objectives by aligning resources with business priorities through a simple rule-based framework. Generally, it is a good idea to start using workload management for a single use case, ensure that you are getting the benefits as expected and then expand into other use cases.

Check out this video from .conf19 on new workload management features. If you have more questions on how to solve your use case through workload management, please reach out to your Splunk point of contact or join the #workload_management slack channel on splunk-usergroups.

Deep Learning Toolkit 3.1 - Examples for Prophet, Graphs, GPUs and DASK

This blog post highlights our customer's data scientists most demanded use cases and new algorithms for our recently released Splunk Deep Learning Toolkit 3.1.

Platform 2 Min Read

What's New in Splunk AR

Learn about our powerful new set of features for Splunk AR, along with the launch of Splunk AR for Android Private Preview.

Platform 3 Min Read

Announcing the General Availability of Cloud Monitoring Console’s Maintenance Dashboard

The new Maintenance Dashboard in the Cloud Monitoring Console app aims to assist Splunk Cloud Platform admins in effectively managing maintenance tasks and staying informed about Splunk-initiated maintenance for improved operational efficiency.

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.