With the increasing usage of Splunk within organizations, Splunk administrators are challenged to meet end-user SLAs and the performance needs of their growing user base amidst the growing volumes of data. Different search and ingestion workloads are competing for the same set of system resources within a deployment. A badly crafted search or a newbie user running wildcard searches can inadvertently end up hogging all the system resources, derailing business critical searches.
The quick and naive response is often to over-provision hardware; but while more hardware may be helpful in certain scenarios, it's not necessarily the panacea to provide predictable search performance and certainly isn't a sustainable business model.
Wouldn't it be great if Splunk dynamically allocated system resources based on business priorities to deliver the most business value to the organization while providing levers to partition and protect business critical workloads?
Workload management in Splunk Enterprise 7.2 empowers Splunk administrators with explicit control over their system resource (CPU, memory) allocation. Workload management provides a policy-based mechanism to reserve system resource (CPU, memory) for ingestion and search workloads in alignment with business priorities. It enables administrators to classify workloads into different workload groups and reserve portions of system resources per workload group regardless of the load on the system. It adds guardrails over system resource usage preventing bad actors or an ill-conceived search from impacting business critical operations.
By prioritizing critical search workloads over non-critical workloads and preventing runaway searches from disrupting business critical searches, workload management brings in more predictability to meet service SLAs. It provides a rule-based engine to partition system resources into workload pools and map those resources to Splunk apps and roles based on business rules. This provides the ability to specify varying resource allocations for search workloads based on roles/apps, ensuring efficient utilization of resources in accordance with business priorities. CPU and memory-hungry applications can be isolated and constrained to use a smaller portion of the available resources.
Workload management allows for separation of search and ingestion traffic, preventing heavy search loads from impacting data ingestion and thus reducing data lags. This enables predictable onboarding of new users and ingestion data traffic without impacting existing users and delivering consistent performance.
In addition, workload management includes a provision to dynamically re-assign resources on-demand, empowering administrators to prioritize/de-prioritize ongoing searches. This flexibility enables administrators to dynamically accelerate critical searches or throttle non-critical searches on-demand.
Rather than over-provisioning more hardware to meet performance requirements, workload management enables efficient utilization of resources lowering TCO and bringing in higher levels of efficiencies. A chargeback model based on system resource usage as reported by workload management can be incorporated across teams/LOBs within an organization, thus enabling a better operational model.
Workload management not only improves the end-user search experience, but also impacts the organization’s bottom line. By efficiently and systematically managing workloads, the organizational resources are fully leveraged and capacity expansions can be more accurately determined without the need to over-provision resources.
----------------------------------------------------
Thanks!
Bharath Aleti
The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.
Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.