Part of the beauty of Splunk IT Service Intelligence (ITSI) is that it provides users with flexible models of their entities and services. Additionally, Splunk ITSI can scale to support monitoring of thousands of services and tens of thousands of entities.
This blog post provides a sample of best practices for configuring a large-scale Splunk ITSI deployment. It's NOT a complete list of Splunk ITSI configuration guidelines; check out the Splunk ITSI Documentation for more in-depth information about that.
It's not good to have so many KPIs in a single service that you can barely keep track of them all. I’ve seen cases where the customer configured more than 50 KPIs in a service. How do you effectively monitor and troubleshoot the service when there are that many KPIs involved?
Part of the beauty of Splunk ITSI is that it makes it easy to focus on what matters in your environment. So spend time crafting and fostering the KPIs that you really care about and want to measure. You’ll save yourself time troubleshooting later.
So what is the recommended number of KPIs for a single service?
It’s best to have no more than 20 KPIs per individual service—more than enough to capture the key metrics you care about (like CPU, IO, disk free, and response time).
Entity rules within a service ensure that you’re dynamically filtering to the entities that matter in your environment. Use entity rules that are prescriptive enough that you’re catching the entities you care about for that service. If you’re matching service-level entity rules to tens and thousands of entities, it can be difficult to monitor the entities that are of interest, and can slow internal operations.
Recommendation:
Splunk ITSI does not limit the number of matching entities for a service. The recommendation is be mindful of the performance implication when you have a lot of entities matched for a single service.
In Splunk ITSI, KPI base searches are recommended to minimize the overall search load at the Splunk Enterprise level.
Use the following guidelines to decide on the correct number of KPIs to be powered by a single KPI base search.
When configuring a KPI base search, consider the following recommendations:
Go to the search inspector and check the search execution stats. If the KPI base search is scheduled to run every minute but the actual search execution takes longer than a minute, the next scheduled search will be skipped. This will cause delayed KPI alert values and health score results, and means you have too many KPIs tied to a single KPI base search. Try reducing the number of KPIs.
max_action_results = <integer>
* The maximum number of results to load when triggering an alert action.
* Default: 50000
How do I calculate the limit:
limit = (number of KPIs * number of entities for each service) + (number of services) * 2
Ex: A KPI base search is powering 5,000 KPIs across 500 services.
Each service is matching 10 entities.
limit = (5000 x 10) + 500 x 2 = 50100
How do I know how many KPIs are associated with a single KPI base search:
Starting with Splunk ITSI version 4.0.x, the ITSI Health Check dashboard provides these statistics:
Warning: Increasing this limit to a very large number can have a negative impact on the overall system, as more memory must be allocated to support the increased number of search results.
Recommendation:
Again, Splunk ITSI does not limit the number of KPIs that can be powered by a single KPI base search. Use the above recommendations to decide on the correct amount of KPIs that can use a single KPI base search in your environment.
Splunk IT Service Intelligence provides actionable insight into the performance and behavior of your IT operations, making it easy to effectively monitor your environment and provide value for your business. It allows you to see across silos and services for easier collaboration and real-time information about your IT and business health. By leveraging the best practices and recommendations in this blog post, you can successfully configure Splunk ITSI in a large-scale environment to meet the demands of your business.
This blog post is a collaborative work by Kan Wu, Keegan Dubbs, and Elizabeth Snyder.
----------------------------------------------------
Thanks!
Kan Wu
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.