Tips & Tricks

January 03, 2011

3 Minute Read

Managing Index sizes in Splunk

By Splunk

When deploying Splunk, the topic of how to manage index sizes will surface. The following is a detailed scenario on how you can manage index space in Splunk (Valid for pre 4.2.x lines of Splunk – this is now much easier with 4.2 and higher):

There are a few key factors that influence how much attention you must pay to disk space management. These factors are:

Total Disk Space Available
Ratio of Local/Non-Local Disk Space
Retention Policy

The first thing you should be aware of, is the minimum free disk space setting (minFreeSpace for diskUsage in server.conf). This setting tells Splunk to halt indexing when the amount of free disk spec hits this value. By default, this is set to 2000 (MB). For enterprise deployments, you may need to move around some data to make space and the 2 GB limit is too small. Therefore, setting this to 20 GB or more may be ideal. To set it to 20 GB, create or edit the $SPLUNK_HOME/etc/system/local/server.conf file as follows:

[diskUsage]

minFreeSpace = 20000

The next topic of importance is the amount of local disk space. If all of your disks are local, then you do not need to be concerned with the following details. If your Splunk system has a non-local partition that is utilized for long-term storage, then you will need to manage the settings for where Splunk puts older data. There is a significant amount of information and terminology related to this topic, so we will break things down by using an example scenario. Let us assume my system is as follows

300 GB of local storage (RAID 10 w/fast disks), mounted as the root partition /
1.0 TB of non-local storage (NFS mounted partition or SAN), mounted to /storage

Splunk strongly recommends that indexing and searching take place on the local disks. Therefore, we should set Splunk to index data as follows:

hot and warm buckets will be stored on local storage
cold buckets on the non-local storage

To set this for the main index, you would use the following settings in your indexes.conf file:

[main]

homePath = $SPLUNK_DB/defaultdb/db

coldPath = /storage/defaultdb/colddb

thawedPath = /storage/defaultdb/thaweddb

Now that we have told Splunk where to put the data, we still need to tell it how much space we have available within each location. To do this, you need to calculate the total need by using the following formula:

LocalDiskSpace = minFreeSpace + (maxHotBuckets * maxDataSize) + (maxWarmDBCount * maxDataSize)

Let’s break down each value in the above equation:

LocalDiskSpace = Total Available Space on the local storage partition (300 GB in our case)
minFreeSpace = amount of available free disk space until Splunk halts indexing (assume 20 GB as setup earlier)
maxHotBuckets = Maximum number of Hot buckets to be spawned. By default, the main index is set to 10. All others will use 1 by default.
maxDataSize = Bucket Size in MB. Note that auto=750 MB and auto_high_volume=10 GB. You can also manually set this by using numeric values in MB. Since the main index defaults to auto_high_volume, we can assume 10 GB.
maxWarmDBCount = total number of warm buckets. Remember, Indexed data transitions from hot > warm > cold. By default, this is set to 300 although and this is the value we should be calculating.

Substituting the above values in our fomula gives us the following math:

300 = 20 + (10 * 10) + (300 * 10)

300 = 3120 ?????

Since the local disk space does not match the bucket sizing, we should adjust the number of buckets and/or the size of the buckets. The best practice is to adjust only the maxWarmDBCount. This is for two reasons: it is ideal to have multiple hot buckets for bucket span purposes; maxDataSize is optimally tuned out of the box;. If necessary, you could configure between 3-5 maxHotBuckets as that will still allow for a broad range of bucket span. To revisit our equation algebraically:

maxWarmDBCount = (LocalDiskSpace-minFreeSpace-(maxHotBuckets*maxDataSize))/maxDataSize

maxWarmDBCount = (300-20-100)/10

maxWarmDBCount = 18

We now have our Warm DB count which can be set in the main index stanza. A more user friendly version for determining sizing:

maxWarmDBCount = LocalDiskSpace/maxDataSize – minFreeSpace/maxDataSize – maxHotBuckets

Now that we have the Local storage sorted out, we must tune the total index size. Since we have already tuned the hot and warm buckets, Splunk will automatically ‘freeze’ (aka – delete or archive) the oldest cold bucket. Here is the equation to calculate the maximum index size.

NonLocalDiskSpace = maxTotalDataSizeMB – LocalDiskSpace – minFreeSpace

maxTotalDataSizeMB = NonLocalDiskSpace + LocalDiskSpace – minFreeSpace

maxTotalDataSizeMB = 1000000 + 300000 – 20000

maxTotalDataSizeMB = 1280000

In summary, our main index will be broken down as follows:

100 GB – Hot buckets stored in $SPLUNK_DB/defaultdb/db
180 GB – Warm buckets stored in $SPLUNK_DB/defaultdb/db
1000 GB – Cold buckets stored in /storage/defaultdb

The final stanza in the indexes.conf file would look like this:

[main]

maxWarmDBCount = 18

maxTotalDataSizeMB = 1280

homePath = $SPLUNK_DB/defaultdb/db

coldPath = /storage/defaultdb/colddb

thawedPath = /storage/defaultdb/thaweddb

----------------------------------------------------
Thanks!
Simeon Yep

Splunk

The world’s leading organizations trust Splunk to help keep their digital systems secure and reliable. Our software solutions and services help to prevent major issues, absorb shocks and accelerate transformation. Learn what Splunk does and why customers choose Splunk.

Tips & Tricks 2 Min Read

Exporting Large Results Sets to CSV

Tips & Tricks 1 Min Read

Syncing Lookups Using Pure SPL

Sync lookup files without relying on third party tools

Tips & Tricks 3 Min Read

An easy way to generate sample data

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.