Disk Space Estimator for Index Replication

By Mustafa Ahamed

One of the first questions customers ask when they start considering index replication is about storage requirements. Index replication keeps additional copies of data for redundancy purposes, but how would it affect the storage needs and what are the factors to consider in designing scalable storage architecture are the main questions. I’ll cover the important factors in this blog post.

There are two major dimensions to consider. First one is the replication policies and the second one is the data retention period.

Replication Factor (RF) and Searchability Factor (SF) control the replication policies. RF determines the number of raw data files to keep while SF determines the number of time series indexed files. For syslog data, the raw data files take about ~ 15% of disk space and index files takes about ~ 35% of disk space.

The second dimension is retention period. This determines how long you want to keep the data in Splunk before aging out the old data. Typical aging policies are 3 months to 6 months, although we have seen cases were the retention period is in years.

Let’s walk through an example to see these numbers in action. Assume that the daily indexing volume is 200GB, RF and SF is set to 2 and we have a 2-node cluster. Let’s use a retention period of 45 days.

Raw data files related storage needs = 15% * 200 * 2 * 45 = 2.6 TB

Index data files related storage needs = 35% * 200 * 2 * 45 = 6.4 TB

Total space required on the cluster to store 45 days of data = 2.6 + 6.4 = 9 TB

Space required on an individual peer = 9 / 2 = 4.5 TB.

So, using this little formula we have roughly identified that we need 9 TB of disk space on the entire cluster to store, replicate, and retain data for 45 days. You can adjust the retention period and replication policies to see how it would affect your storage needs.

Mustafa Ahamed

Mustafa has been with Splunk for 10 years and leading the Product Management for Splunk Enterprise Platform. He's passionate about large scale deployments and complex systems. Love to travel, explore new places and food!

Tips & Tricks 4 Min Read

Splunking web-pages

Tips & Tricks 3 Min Read

Splunking DNS Using Splunk Stream – AKA, The Easy Way

NS is one of the most powerful data sources to ingest into Splunk for analytics, security or IT operations use cases or business operations insights.

Tips & Tricks 2 Min Read

Admission Rules: Use Cases and Best Practices

Discover how to reduce the impact of poorly written searches by creating rules to filter such searches with the expanded capabilities of Splunk Workload Management.

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.