DevOps

Introduction to Rally for Elasticsearch benchmarking

Posted by Kelson on March 14th, 2021.

Summary

Anyone that manages Elasticsearch clusters must be aware at all times of the health of the cluster, being through the use of monitoring tools such as ElasticserachHQ, X-Pack or even through simple RESTAPI calls scripts.
Monitoring is extremely important, but there is one point as important as monitoring, which is knowing the cluster’s capacity and limits. Knowing such metrics is key to allow the cluster administrator to evaluate the capacity, aiming to reduce the chances of having a slow indexing period or even an incident.
Saying that this post will provide some basic steps for using Rally for performing Elasticsearch Benchmark tests against an existing cluster so we can extract key metrics such as indexing time, latency, throughput, errors and others.
The collected metrics are key to decisions such as whether we should scale nodes, reduce logging and others.

What is Rally and how to Install it?

According to the official documentation, Rally is an Elasticsearch Benchmark tool that can help us by performing the following tasks:

Setup and teardown of an Elasticsearch cluster for benchmarking
Management of benchmark data and specifications even across Elasticsearch versions
Running benchmarks and recording results
Finding performance problems by attaching so-called telemetry devices
Comparing performance results

For now, the feature that we will focus on is to perform benchmarking on a remote cluster, so prior to performing the steps that will follow, I expect you to have access to an Elasticsearch cluster.

To install Rally, you can perform the following on a Debian/Ubuntu based system:

Install Python 3.4 or higher.

$ sudo apt-get install gcc python3-pip python3-dev

Install Git.

$ sudo apt-get install git

Finally, install Rally.

$ pip3 install esrally

If you are using a system based on other architecture than Debian/Ubuntu, refer to the official installation guide here.

Configuring Rally

With Rally installed, we require a one-time configuration. To configure Rally, perform the following on your terminal:

$ esrally

Once executed, you will see an output similar to:

   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
/ _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/

Running simple configuration. Run the advanced configuration with:

  esrally configure --advanced-config


WARNING: Will overwrite existing config file at [/home/kelson/.rally/rally.ini]

* Autodetecting available third-party software
  git    : [OK]
  JDK    : [MISSING] (You cannot benchmark Elasticsearch on this machine without a JDK.)

* Setting up benchmark data directory in /home/kelson/.rally/benchmarks
Enter the JDK 10 root directory (Press Enter to skip):

Note that Rally auto-detected our git but failed to detect JDK. This happened because we indeed did not install JDK.
This is not a problem for our use-case. Our goal is to perform benchmarking against a remote cluster so our Rally does not have to build an Elasticsearch node, which would require a JDK.

You can then press Enter twice to accept the default Rally configuration until it presents you:

Configuration successfully written to /home/kelson/.rally/rally.ini. Happy benchmarking!

More info about Rally:

* Type esrally --help
* Read the documentation at https://esrally.readthedocs.io/en/0.11.0/
* Ask a question on the forum at https://discuss.elastic.co/c/elasticsearch/rally

We are now ready to race!

Choose your Destiny Track

With Rally configured, we can now start our benchmarking race (the execution of a benchmarking experiment) but first, we must choose a track.
Tracks in Rally are nothing more than different scenarios that you can choose from, and you can list the default tracks with:

$ esrally list tracks

Note that each track has unique properties (Documents, Size, …), each providing a specific set of documents used in the benchmarking.
For our scenario, we will aim to benchmark with the maximum number of documents possible and the track http_logs seems ideal, as it will index 247.249.096 documents on the cluster during benchmarking.

Let’s Race

With our track chosen, it is time to start our benchmarking. We can do this through the following command:

$ esrally --track=http_logs --target-hosts=ELASTICSEARCH_IP:ELASTICSEARCH_PORT --pipeline=benchmark-only

Breaking down the command, these are the key parameters:

--track -> the chosen track from the "esrally list tracks" command
--target-hosts -> the remote Elasticsearch cluster that we will be benchmarking
--target-port  -> which port the Elasticsearch service is responding from
--pipeliine -> benchmark-only means that we are benchmarking against an existing Elasticsearch instance. List more options with "esrally list pipelines"

Once running, it may take a good while for the benchmarking process to complete, but once it does, it will display your results on the console.

|   Lap |                          Metric |         Task |       Value |    Unit |
|------:|--------------------------------:|-------------:|------------:|--------:|
|   All |                   Indexing time |              |     250.609 |     min |
|   All |          Indexing throttle time |              |           0 |     min |
|   All |                      Merge time |              |     294.864 |     min |
|   All |                    Refresh time |              |     53.2047 |     min |
|   All |                      Flush time |              |      1.5581 |     min |
|   All |             Merge throttle time |              |     170.567 |     min |
|   All |              Total Young Gen GC |              |      57.939 |       s |
|   All |                Total Old Gen GC |              |       5.145 |       s |
|   All |                      Store size |              |     46.5778 |      GB |
|   All |                   Translog size |              |  0.00478806 |      GB |
|   All |          Heap used for segments |              |     104.422 |      MB |
|   All |        Heap used for doc values |              |    0.991104 |      MB |
|   All |             Heap used for terms |              |     85.7431 |      MB |
|   All |             Heap used for norms |              |    0.791382 |      MB |
|   All |            Heap used for points |              |     4.73582 |      MB |
|   All |     Heap used for stored fields |              |     12.1608 |      MB |
|   All |                   Segment count |              |        2294 |         |
|   All |                  Min Throughput | index-append |     5946.26 |  docs/s |
|   All |               Median Throughput | index-append |     6121.22 |  docs/s |
|   All |                  Max Throughput | index-append |     7214.93 |  docs/s |
|   All |         50th percentile latency | index-append |     5700.31 |      ms |
|   All |         90th percentile latency | index-append |     9638.46 |      ms |
|   All |         99th percentile latency | index-append |     15103.5 |      ms |
|   All |       99.9th percentile latency | index-append |     63454.2 |      ms |
|   All |      99.99th percentile latency | index-append |      133296 |      ms |
|   All |        100th percentile latency | index-append |      189127 |      ms |
|   All |    50th percentile service time | index-append |     5700.31 |      ms |
|   All |    90th percentile service time | index-append |     9638.46 |      ms |
|   All |    99th percentile service time | index-append |     15103.5 |      ms |
|   All |  99.9th percentile service time | index-append |     63454.2 |      ms |
|   All | 99.99th percentile service time | index-append |      133296 |      ms |
|   All |   100th percentile service time | index-append |      189127 |      ms |
|   All |                      error rate | index-append |        0.07 |       % |
|   All |                  Min Throughput |      default |        1.44 |   ops/s |
|   All |               Median Throughput |      default |        1.45 |   ops/s |
|   All |                  Max Throughput |      default |        1.46 |   ops/s |
|   All |         50th percentile latency |      default |      310299 |      ms |
|   All |         90th percentile latency |      default |      331423 |      ms |
|   All |         99th percentile latency |      default |      336346 |      ms |
|   All |        100th percentile latency |      default |      336869 |      ms |
|   All |    50th percentile service time |      default |     646.843 |      ms |
|   All |    90th percentile service time |      default |     658.976 |      ms |
|   All |    99th percentile service time |      default |     939.048 |      ms |
|   All |   100th percentile service time |      default |     940.766 |      ms |
|   All |                      error rate |      default |           0 |       % |
|   All |                  Min Throughput |         term |        1.68 |   ops/s |
|   All |               Median Throughput |         term |        1.68 |   ops/s |
|   All |                  Max Throughput |         term |        1.69 |   ops/s |
|   All |         50th percentile latency |         term |      316159 |      ms |
|   All |         90th percentile latency |         term |      338408 |      ms |
|   All |         99th percentile latency |         term |      343543 |      ms |
|   All |        100th percentile latency |         term |      344095 |      ms |
|   All |    50th percentile service time |         term |     569.796 |      ms |
|   All |    90th percentile service time |         term |     576.895 |      ms |
|   All |    99th percentile service time |         term |     800.579 |      ms |
|   All |   100th percentile service time |         term |     1010.78 |      ms |
|   All |                      error rate |         term |           0 |       % |
|   All |                  Min Throughput |        range |        1.02 |   ops/s |
|   All |               Median Throughput |        range |        1.03 |   ops/s |
|   All |                  Max Throughput |        range |        1.03 |   ops/s |
|   All |         50th percentile latency |        range |     46187.2 |      ms |
|   All |         90th percentile latency |        range |     58905.8 |      ms |
|   All |         99th percentile latency |        range |     61606.7 |      ms |
|   All |        100th percentile latency |        range |     61869.4 |      ms |
|   All |    50th percentile service time |        range |     938.223 |      ms |
|   All |    90th percentile service time |        range |     975.216 |      ms |
|   All |    99th percentile service time |        range |     1681.98 |      ms |
|   All |   100th percentile service time |        range |     1878.68 |      ms |
|   All |                      error rate |        range |           0 |       % |
|   All |                  Min Throughput |   hourly_agg |         0.2 |   ops/s |
|   All |               Median Throughput |   hourly_agg |         0.2 |   ops/s |
|   All |                  Max Throughput |   hourly_agg |         0.2 |   ops/s |
|   All |         50th percentile latency |   hourly_agg |     4860.77 |      ms |
|   All |         90th percentile latency |   hourly_agg |     9422.83 |      ms |
|   All |         99th percentile latency |   hourly_agg |     10390.7 |      ms |
|   All |        100th percentile latency |   hourly_agg |     10609.1 |      ms |
|   All |    50th percentile service time |   hourly_agg |        4662 |      ms |
|   All |    90th percentile service time |   hourly_agg |     5086.03 |      ms |
|   All |    99th percentile service time |   hourly_agg |     7019.26 |      ms |
|   All |   100th percentile service time |   hourly_agg |     9548.57 |      ms |
|   All |                      error rate |   hourly_agg |           0 |       % |
|   All |                  Min Throughput |       scroll |        0.39 | pages/s |
|   All |               Median Throughput |       scroll |         0.4 | pages/s |
|   All |                  Max Throughput |       scroll |        0.41 | pages/s |
|   All |         50th percentile latency |       scroll | 1.21201e+07 |      ms |
|   All |         90th percentile latency |       scroll | 1.67727e+07 |      ms |
|   All |         99th percentile latency |       scroll | 1.77756e+07 |      ms |
|   All |        100th percentile latency |       scroll | 1.78884e+07 |      ms |
|   All |    50th percentile service time |       scroll |     58627.6 |      ms |
|   All |    90th percentile service time |       scroll |     73354.6 |      ms |
|   All |    99th percentile service time |       scroll |     86941.1 |      ms |
|   All |   100th percentile service time |       scroll |     87455.5 |      ms |
|   All |                      error rate |       scroll |           0 |       % |


-----------------------------------
[INFO] SUCCESS (took 62851 seconds)
-----------------------------------

Understanding the Race Results

Based on our chosen http_logs track that contains 247.249.096 documents, Rally’s summary report displays how the cluster behaved against such a track.
Let’s now look at some key metrics and what they mean:

— Indexing Time: How long the Node took to index all the 247.249.096 documents.

|   Lap |                          Metric |         Task |       Value |    Unit |
|   All |                   Indexing time |              |     250.609 |     min |

— Indexing throttle time: Total time indexing has been throttled as reported by the indices stats API (the less the better).

|   Lap |                          Metric |         Task |       Value |    Unit |
|   All |          Indexing throttle time |              |           0 |     min |

— Throughput: Measure of how many Events per Second (EPS) the cluster could handle.

|   Lap |                          Metric |         Task |       Value |    Unit |
|   All |                  Min Throughput | index-append |     5946.26 |  docs/s |
|   All |               Median Throughput | index-append |     6121.22 |  docs/s |
|   All |                  Max Throughput | index-append |     7214.93 |  docs/s |

— Error rate: Errors response codes or Exceptions thrown by the Elasticsearch client. Ideally, this must be 0% and if it is greater, we require an inspection of Elasticsearch logs to understand the root cause of why Elasticsearch failed to index some documents.

|   Lap |                          Metric |         Task |       Value |    Unit |
|   All |                      error rate | index-append |        0.07 |       % |
|   All |                      error rate |      default |           0 |       % |
|   All |                      error rate |         term |           0 |       % |
|   All |                      error rate |        range |           0 |       % |
|   All |                      error rate |   hourly_agg |           0 |       % |
|   All |                      error rate |       scroll |           0 |       % |

The presented metrics is a small subset of the Rally metrics but should be enough to give you a better understanding of how well your cluster behaves under high indexing scenarios.
An understanding of the remaining available metrics is recommended, and you can find more details on them here.

Conclusion

This post presented a quick introduction to Rally, where we moved from a quick installation to executing some benchmarking steps against a remote Elasticsearch in just a few steps. The results given by the tool will increase your understanding of your cluster, while also aiding in any operational decision making.
There are many other features that were not covered in this introduction article, but this was hopefully enough to get you going and allow further experimentation.
Apart from the default tracks, Rally also provides features such as the creation of custom tracks, tournaments (clusters benchmarking comparison) and others, but these will be topics for future posts. Stay tuned!

Kelson

//iamkel.dev

Software engineer. Geek. Traveller. Wannabe athlete. Lifelong student. Works at IBM and hosts the @HardcodeCast.

View Comments (0) ...