Increase Max Index Size and Archive Indexed Data

This post was authored by Detecx team NetbyteSec

Introduction

In the dynamic realm of data analytics and log management, Splunk stands as a powerful tool, transforming raw data into actionable insight. In this article, we want to explain about the complexities of Splunk indexing, focusing on crucial aspect that can significantly impact your system's availability.

What issue that leads to this setup. What happens if the index reaches its maximum size.

When the size of data exceeds the maximum size of the index, previous data in the index become not searchable. It became a Frozen bucket. Frozen bucket means data rolled from cold. By default, the indexer will delete frozen data, but the user can archive it. Archived data can later be thawed. Thawed bucket means data restored from the archive. If the user archive frozen data, the user later can return it to the index by thaw it.

Increase and set maximum index size

First, the user needs to know the architecture of Splunk that have been deployed. Different architecture of Splunk deployment uses a different way to increase the maximum size of index either single instance, distributed environment, and clustering environment. Distributed environments describe the separation of indexing and searching of data in Splunk. For non-cluster indexes, edit indexes.conf in $SPLUNK_HOME/etc/system/local.

For cluster indexes, edit indexes.conf on the master cluster node. Splunk indexers were in the home path ($SPLUNK_DB). Changing the maximum size by default will affect all indexers because it contains in the same volume. Detail configuration of index located in indexes.conf. Set attribute maxTotalDataSizeMB in indexes.conf to configure index storage size.

Figure 1: maxTotalDataSizeMB attribute


Figure 2: Maximum size index have been changed

After the change, the attribute maxTotalDataSizeMB to 750000 (by default 500000), view the cluster bundle status.

Command: # /opt/splunk/bin/splunk show cluster-bundle-status

Figure 3: Cluster bundle status

It views the status of bundle validation. This view indicates validation on cluster bundle was a success. In case of validation failure, it will state failure. It also indicates that peer's action to restart is not required.

Figure 4: Configuration bundle actions in Splunk Web

It also can be done through Splunk Web.

1.      On Splunk Web master node, click Settings then Indexer Clustering.

2.      Go to edit then Configuration Bundle Actions

3.      Click Validate and Check Restart to check restart action either required or not.

4.      Then click Push to distribute the configuration bundle to peers.  

Archive indexed data

Depend on how we set the data retirement when the data reaches the frozen state, the indexer will delete the data. If we want the data to be archived rather than be deleted by default, we must configure the indexer to archive the data into a directory that we choose to store the frozen data or we must specify a customized archiving script for the indexer to follow.

We can do this in two ways, either setting the coldToFrozenDir attribute where you specify the location where the index will automatically archive the frozen data or specifying a valid coldToFrozenScript attribute where the indexer will run a user-supplied script when data reaches the frozen state in indexes.conf. By default, you can only set one of the above two attributes or if you set both the coldToFrozenDir will take precedence over coldToFrozenScript.

If you choose either setting the coldToFrozenDir attribute or specifying a valid coldToFrozenScript, the indexer will run a default script that simply writes the name of the bucket being erased to the log file $SPLUNK_HOME/var/log/splunk/splunkd_stdout.logfollowing which, the bucket being erased.

Archive with coldToFrozenDir attribute

If we set the coldToFrozenDir attribute, the indexer will automatically copy frozen buckets to the specified location where the indexer will put the archived buckets before erasing the data from the index.

Figure 5: New volume for the archive directory and the pat set to coldToFrozenDir

Archive with an archiving script

If we set the coldToFrozenScript attribute, the script we specify will run just before the indexer erases the frozen data from the index. The indexer also ships with an example archiving script that we can edit and customize, $SPLUNK_HOME/bin/coldToFrozenExample.py.

Figure 6: Path that has been set up for the attributes

Pushing configuration bundles in indexer cluster

To distribute new or changed files and apps across all peers, we do the following:

1. Prepare the files and apps and test them.

2. Move the files and apps into the configuration bundle on the master node. For standalone configuration files place it in the /_cluster/local subdirectory.

3. Validate the bundle. Click Validate and Check Restart to check restart action either required or not. This is optional.

4. Push the entire bundle to the peers from master. This overwrites the contents of the peers' current bundle.

For step number 3 and 4, we can push configuration bundle thru Splunk Web and CLI. When the push is successful, the file now located in their local $SPLUNK_HOME/etc/slave-apps and the peers will use the new set of configurations. Just leave the files in $SPLUNK_HOME/etc/slave-apps.

In conclusion, expanding the maximum size of index and implementing data archive strategies are important steps in optimizing your Splunk system. By increasing the index size, you can avoid data being deleted while its running and implementing data archiving allow you to have a backup plan for the ingested data in case the size of the index reach the maximum state. Both action can contribute to a more scalable and responsive data infrastructure.