Kelson Martins

Moving Elasticsearch indexes with elasticdump

Introduction

Kelson

Kelson

Software engineer. Geek. Traveller. Wannabe athlete. Lifelong student. Works at IBM and hosts the @HardcodeCast.


LATEST POSTS

Binary Search Algorithms with Examples 07th June, 2022

Configuring Openshift Identity Providers 06th February, 2022

DevOps

Moving Elasticsearch indexes with elasticdump

Posted on .

Introduction

Recently, I had a requirement to migrate data between Elasticsearch clusters while building a new ELK stack.
There are a few tools that can get the job done, including the input-elasticsearch Logstash plugin, Elasticsearch reindex API and elasticdump.
On my journey of evaluating new tools, I experimented with elasticdump and this post provides details on how it can complete a requirement of moving data from one Elasticsearch node into another.

First things first. What is elasticdump?

Elasticdump is an open-source tool, which according to its official description has the goal of moving and saving Elasticsearch indexes. It works by requesting data from an input and redirecting it into an output. Either input or output may be an Elasticsearch URL or a File.

Elasticsearch scenario

For this article scenario, the requirement is to move data from one Elasticsearch cluster into another.
Based on elasticdump features, 2 options can achieve the goal:

Option 1) Using an Elasticsearch URL for both input and output.
This approach is the most straightforward option, requiring one command to move the data across 2 Elasticsarch clusters.

--input=cluster_one
--output=cluster_two

Option 2) Using an Elasticsearch URL for input and File as output, followed by a File input and an Elasticsearch URL output.

This approach requires at least 2 commands, one to save the data from a cluster into a file followed by a command to use the generated file as input to another cluster.
This approach may be useful if you want to perform a backup of the indexes before taking further actions.

--input=cluster_one
--output=data.json
--input=data.json
--output-cluster_two

Installing Elasticdump

Installation of Elasticdump can be performed via npm install. Npm is short for Node Package Manager, an online repository hosting open-source Node.js projects.
You can install npm through:

# Ubuntu
sudo apt-get install npm

# CentOS
sudo yum install npm

With npm installed, install elasticdump with:

npm install elasticdump -g

Using elasticdump

Using elasticdump is as simple as performing the following

elasticdump \
  --input={{INPUT}} \
  --output={{OUTPUT}} \
  --type={{TYPE}}

Where {{INPUT}} or {{OUTPUT}} can be an Elasticsearch URL such as {protocol}://{host}:{port}/{index} or a File such as /tmp/dump.json, while {{TYPE}}must be analyzer, mapping or data.

Export Elasticsearch Data – Scenario 1

For this scenario, the aim is to export data from an Elasticsearch index called docker-daemon while injecting the data into a remote Elasticsearch node, keeping the same index name.
You can achieve the objective with the following command:

elasticdump \
  --input=http://user:password@old_node:9200/docker-daemon \
  --output=http://user:password@new_node:9200/docker-daemon \
  --type=data

The following is a sample of the expected output:

Thu, 21 Sep 2017 14:40:29 GMT | starting dump
Thu, 21 Sep 2017 14:40:31 GMT | from source elasticsearch (offset: 0)
Thu, 21 Sep 2017 14:40:33 GMT | to destination elasticsearch, wrote 53
Thu, 21 Sep 2017 14:40:33 GMT | from source elasticsearch (offset: 53)
Thu, 21 Sep 2017 14:40:33 GMT | Total Writes: 53
Thu, 21 Sep 2017 14:40:33 GMT | dump complete

To confirm the data transfer occurred successfully, perform the following command on the target Elasticsearch node:

$ curl -u user:password localhost:9200/_cat/indices?v | grep docker-daemon

If the elasticdump action executed successfully, the index should appear as available.

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload  Upload   Total   Spent    Left  Speed
100  2394  100  2394    0     0   245k      0 --:--:-- --:--:-- --:--:--  259k
green  open   logstash-docker-daemon        eilJdiZvSGixTNIfMwP-kw   5   2         41            0    292.3kb        292.3kb

Export Elasticsearch Data – Scenario 2

For this scenario, two distinct steps are necessary:

  1. Export data from an Elasticsearch index into a file.
  2. Import data from a file into an Elasticsearch index.

The following commands achieve the desired goal:

elasticdump \
  --input=http://user:password@old_node:9200/docker-daemon \
  --output=/data/docker-daemon.json \
  --type=data
elasticdump \
  --input=/data/docker-daemon.json \
  --output=http://user:password@new_node:9200/docker-daemon \
  --type=data

Observe that we first export the data from the index into the file /data/docker-daemon.json. We then use this newly generated file as input data for another Elasticsearch node.

Analyzers and Mappings

What this article showed so far was the most basic method of moving an index from a node into a new one. In a more realistic scenario, when moving an index you will want to move the index with its appropriate analyzers and field mappings.

For this scenario, analysers and mappings should be moved prior to transferring the indexes. We can achieve this by cascading the 3 statements as shown below:

elasticdump \
  --input=http://user:password@old_node:9200/docker-daemon \
  --output=http://user:password@new_node:9200/docker-daemon \
  --type=analyzer
elasticdump \
  --input=http://user:password@old_node:9200/docker-daemon \
  --output=http://user:password@new_node:9200/docker-daemon \
  --type=mapping
elasticdump \
  --input=http://user:password@old_node:9200/docker-daemon \
  --output=http://user:password@new_node:9200/docker-daemon \
  --type=date  

Extra Options

Even though this article presented the basic parameters of elasticdump, we have a series of other parameters available. Some commonly used parameters include:

  • --searchBody: Useful when you do not want to export an entire index. Example:  --searchBody '{"query":{"term":{"containerName": "nginx"}}}'.
  • --limit: Shows how many objects to move in a batch per operation. Defaults to 100.
  • --delete: Delete documents from the input source as we move them.

You can find the full list of parameters on the official tool page here.

Final Considerations

Moving Elasticsearch indexes across nodes and clusters should not be a burden, and elasticdump proves that. Elasticdump is easy to use and contains good documentation, so if you need to move Elasticsearch indexes around, look no further.

As mentioned in the article, there are other alternatives such as using logstash elasticsearch-input-plugin or even Elasticsearch Reindex API, but these will be covered in further articles. Stay tuned!

Kelson

Kelson

//iamkel.dev

Software engineer. Geek. Traveller. Wannabe athlete. Lifelong student. Works at IBM and hosts the @HardcodeCast.

Navigation