Elasticsearch Storage Tiering – Index to SSD then Archive to Spinning Disk


Elasticsearch Storage Tiering – Overview

In this article I will explain how to configure Elasticsearch storage tiering. For the purposes of this example let’s say that you need to provide an ELK (Elasticsearch, Logstash, Kibana) service with logs online for 90 days. The majority of our queries are for events within the last 5 days, so we will keep 1 week on expensive fast servers with SSD storage and age older events off to cheap and slow near-line spinning disk systems.

Let’s get started!

Set Elasticsearch node metadata

First we will set a disktype metadata attribute on each of our Elasticsearch nodes. We will tag systems with SSD disk as disktype:ssd and systems with spinning disk as disktype:hdd.

SSD disk “ssd” system configuration example:

#elasticsearch.yml
 
node.disktype: ssd

Spinning disk “hdd” system configuration example:

#elasticsearch.yml
 
node.disktype: hdd

Once these values have been set perform a rolling restart of your cluster to apply the disktype metadata attribute.

Note: You may want to look into delayed shard allocation to speed up and reduce resource utilization during rolling maintenance https://www.elastic.co/guide/en/elasticsearch/reference/current/delayed-allocation.html

Configure Logstash index template to require SSD nodes.

Now that our Elasticsearch nodes are tagged according to their storage type we can configure Logstash to write new indicies to SSD. We’ll accomplish this by adding an index shard allocation filter to the Elasticsearch template used by Logstash when creating a new logstash-YYYY.MM.DD index.

First, we need to ensure that our template changes will not be overwritten by Logstash. Since the template could overwritten by Logstash we need to set template_overwrite to false in the Elasticsearch output section of our Logstash configuration.

# logstash.conf
 
output {
  elasticsearch {
    # your Elasticsearch output section here
    # add the following line
    template_overwrite => false
  }
}

Note: if you have a more complicated Logstash with multiple output sections ensure that you apply this configuration to all relevant elasticsearch output sections.

Now we’re ready to proceed with a Logstash template update. The Logstash Elasticsearch index template is used each day when Logstash creates a new index (logstash-YYYY.MM.DD). We are going to add the Elasticsearch index shard allocation filter setting “index.routing.allocation.require.disktype” to the template. This will require new Logstash indicies to reside on only nodes with “ssd” disktype.

"index.routing.allocation.require.disktype" : "ssd"

To add this setting this you can either pull your running template and merge the above shard allocation filter setting, or use the example supplied below which was created using a vanilla logstash-2.3 template.

To merge, you can pull your active Logstash template like this:

curl -XGET localhost:9200/_template/logstash?pretty

Here is a full example Logstash template with ssd shard allocation setting included. This was created using a vanilla Logstash 2.3 template and includes the “routing.allocation.require.disktype” : “ssd” setting.

curl -XPUT http://localhost:9200/_template/logstash -d'
{
    "template" : "logstash-*",
    "settings" : {
      "index" : {
        "refresh_interval" : "5s",
        "routing.allocation.require.disktype" : "ssd"
      }
    },
    "mappings" : {
      "_default_" : {
        "dynamic_templates" : [ {
          "message_field" : {
            "mapping" : {
              "index" : "analyzed",
              "omit_norms" : true,
              "fielddata" : {
                "format" : "disabled"
              },
              "type" : "string"
            },
            "match_mapping_type" : "string",
            "match" : "message"
          }
        }, {
          "string_fields" : {
            "mapping" : {
              "index" : "analyzed",
              "omit_norms" : true,
              "fielddata" : {
                "format" : "disabled"
              },
              "type" : "string",
              "fields" : {
                "raw" : {
                  "index" : "not_analyzed",
                  "ignore_above" : 256,
                  "type" : "string"
                }
              }
            },
            "match_mapping_type" : "string",
            "match" : "*"
          }
        } ],
        "properties" : {
          "@timestamp" : {
            "type" : "date"
          },
          "geoip" : {
            "dynamic" : true,
            "properties" : {
              "location" : {
                "type" : "geo_point"
              },
              "longitude" : {
                "type" : "float"
              },
              "latitude" : {
                "type" : "float"
              },
              "ip" : {
                "type" : "ip"
              }
            }
          },
          "@version" : {
            "index" : "not_analyzed",
            "type" : "string"
          }
        },
        "_all" : {
          "enabled" : true,
          "omit_norms" : true
        }
      }
    },
    "aliases" : { }
  }
}
'

You can read more about Elasticsearch shard allocation filters at https://www.elastic.co/guide/en/elasticsearch/reference/current/shard-allocation-filtering.html

Configure Curator to move indicies to spinning disk over time.

To move indicies from SSD to spinning disk as they age we will use Curator. We also will use curator to remove indicies that are older than 90 days. Curator will run out of cron on a daily basis.

#Move indicies older than 7 days to spinning disk
curator --host localhost allocation --rule disktype=hdd indices --older-than 7 --time-unit days --timestring '%Y.%m.%d' --prefix logstash
 
#Remove indicies older than 90 days
curator --host localhost delete indices --older-than 90 --time-unit days --timestring '%Y.%m.%d' --prefix logstash

Note: this is curator v3.5 syntax. Full Curator documentation is available from Elastic at https://www.elastic.co/guide/en/elasticsearch/client/curator/3.5/index.html

Done!

You should start seeing new indices created on SSD nodes and moved off to spinning HDD after 7 days. You can use a tool like Marvel to monitor your Elasticsearch cluster and visualize shard and index allocation.

Join the Conversation