When deploying Splunk, the topic of how to manage index sizes will surface. The following is a detailed scenario on how you can manage index space in Splunk (Valid for pre 4.2.x lines of Splunk – this is now much easier with 4.2 and higher):
There are a few key factors that influence how much attention you must pay to disk space management. These factors are:
The first thing you should be aware of, is the minimum free disk space setting (minFreeSpace for diskUsage in server.conf). This setting tells Splunk to halt indexing when the amount of free disk spec hits this value. By default, this is set to 2000 (MB). For enterprise deployments, you may need to move around some data to make space and the 2 GB limit is too small. Therefore, setting this to 20 GB or more may be ideal. To set it to 20 GB, create or edit the $SPLUNK_HOME/etc/system/local/server.conf file as follows:
[diskUsage]
minFreeSpace = 20000
The next topic of importance is the amount of local disk space. If all of your disks are local, then you do not need to be concerned with the following details. If your Splunk system has a non-local partition that is utilized for long-term storage, then you will need to manage the settings for where Splunk puts older data. There is a significant amount of information and terminology related to this topic, so we will break things down by using an example scenario. Let us assume my system is as follows
Splunk strongly recommends that indexing and searching take place on the local disks. Therefore, we should set Splunk to index data as follows:
To set this for the main index, you would use the following settings in your indexes.conf file:
[main]
homePath = $SPLUNK_DB/defaultdb/db
coldPath = /storage/defaultdb/colddb
thawedPath = /storage/defaultdb/thaweddb
Now that we have told Splunk where to put the data, we still need to tell it how much space we have available within each location. To do this, you need to calculate the total need by using the following formula:
LocalDiskSpace = minFreeSpace + (maxHotBuckets * maxDataSize) + (maxWarmDBCount * maxDataSize)
Let’s break down each value in the above equation:
Substituting the above values in our fomula gives us the following math:
300 = 20 + (10 * 10) + (300 * 10)
300 = 3120 ?????
Since the local disk space does not match the bucket sizing, we should adjust the number of buckets and/or the size of the buckets. The best practice is to adjust only the maxWarmDBCount. This is for two reasons: it is ideal to have multiple hot buckets for bucket span purposes; maxDataSize is optimally tuned out of the box;. If necessary, you could configure between 3-5 maxHotBuckets as that will still allow for a broad range of bucket span. To revisit our equation algebraically:
maxWarmDBCount = (LocalDiskSpace-minFreeSpace-(maxHotBuckets*maxDataSize))/maxDataSize
maxWarmDBCount = (300-20-100)/10
maxWarmDBCount = 18
We now have our Warm DB count which can be set in the main index stanza. A more user friendly version for determining sizing:
maxWarmDBCount = LocalDiskSpace/maxDataSize – minFreeSpace/maxDataSize – maxHotBuckets
Now that we have the Local storage sorted out, we must tune the total index size. Since we have already tuned the hot and warm buckets, Splunk will automatically ‘freeze’ (aka – delete or archive) the oldest cold bucket. Here is the equation to calculate the maximum index size.
NonLocalDiskSpace = maxTotalDataSizeMB – LocalDiskSpace – minFreeSpace
maxTotalDataSizeMB = NonLocalDiskSpace + LocalDiskSpace – minFreeSpace
maxTotalDataSizeMB = 1000000 + 300000 – 20000
maxTotalDataSizeMB = 1280000
In summary, our main index will be broken down as follows:
The final stanza in the indexes.conf file would look like this:
[main]
maxWarmDBCount = 18
maxTotalDataSizeMB = 1280
homePath = $SPLUNK_DB/defaultdb/db
coldPath = /storage/defaultdb/colddb
thawedPath = /storage/defaultdb/thaweddb
----------------------------------------------------
Thanks!
Simeon Yep
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.