Elasticsearch
is a powerful search engine based on the Apache Lucene library

This tutorial is about learning the basics of Elasticsearch, possibly going through all the topics one need to know to pass the certification exam (Elasticsearch Engineer I & II).

What you will learn

This tutorial doesn't exactly follow the content of the official training, but you will learn to run Elasticsearch on a small single-node cluster, running locally for free on your laptop.

More specifically, you will learn to do the following:

Looking for the official Elastic training and certifications, you should check these links:

Become an Elastic Certified Engineer

Elastic Engineer I

https://training.elastic.co/instructor-led-training/ElasticsearchEngineerI-Virtual

Elastic Engineer II

https://training.elastic.co/instructor-led-training/ElasticsearchEngineerII-Virtual

Local installation of Elasticsearch

Normal steps to install a single-node Elastic environment would include:

  1. Install Java
  2. Download and Setup Elastisearch
  3. Run Elasticsearch: <path_to_elasticsearch_root_dir>/bin/elasticsearch
  4. Run Kibana: <path_to_kibana_root_dir>/bin/kibana
  5. Verify that your installation is working:

Described below is another way to run Elasticsearch from within a container using Docker. This allows for a simpler, cleaner, (but temporary!!) installation of the Elastic stack which makes it practical for learning purposes.

Prerequisite for any installation of Elasticsearch

On Linux, use sysctl vm.max_map_count on the host to view the current value, and see Elasticsearch's documentation on virtual memory for guidance on how to change this value. Note that the limits must be changed on the host; they cannot be changed from within a container.

~$ sysctl vm.max_map_count 
vm.max_map_count = 65530

On Linux, you can increase the limits by running the following command as root:

sysctl -w vm.max_map_count=262144

Dockerized version of Elasticsearch

Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. Containers allow a developer to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package.
By doing so, thanks to the container, the developer can rest assured that the application will run on any other Linux machine regardless of any customized settings that machine might have that could differ from the machine used for writing and testing the code.

This section describes how to use the sebp/elk Docker image, which provides a convenient centralised log server and log management web interface, by packaging Elasticsearch, Logstash, and Kibana, collectively known as ELK.

the official documentation.

Installation from an official Docker image

This type of installation is recommended to get started, but you might be limited later on when you need to configure Elasticsearch and restart it to apply the changes. So if you need to change your configuration, you will have to (re-)build your Docker image locally.

The RPM and Debian packages will configure this setting automatically. No further configuration is required.

To pull the image from the Docker registry, open a shell prompt and enter:

docker pull sebp/elk

or this one if you haven't configured docker for your own user:

sudo docker pull sebp/elk

Usage

Run a container from the image with the following command:

docker run -p 5601:5601 -p 9200:9200 -p 5044:5044 -it --name elk sebp/elk

Note - The whole ELK stack will be started. See the Starting services selectively section to selectively start part of the stack.

This command publishes the following ports, which are needed for proper operation of the ELK stack:

Installation from a custom Docker image

This procedure is of course more cumbersome than pulling a pre-made docker image, but this will allow us to tweak the configuration of our Elasticsearch instance.

Here we will:

>> Cloning the official image:

/$ cd /tmp
/tmp$ git clone https://github.com/spujadas/elk-docker.git

>> Make your changes to the configuration of Elasticsearch:

...
coming example to be copied from the next section...

>> Build and run the ELK containers from that image:

You may need to remove any former container with the image name ‘elk' (maybe not necessary to check!!!!)

/tmp$ cd elk-docker/
/tmp/elk-docker$ docker-compose build elk
/tmp/elk-docker$ docker-compose up

Go take a cup of coffee or 3 :coffee::coffee::coffee:, it might take 5-10 minutes!

Configuring Elasticsearch

Documentation

Elasticsearch ships with good defaults and requires very little configuration. Most settings can be changed on a running cluster using the Cluster Update Settings API.

The configuration files should contain settings which are node-specific (such as node.name and paths), or settings which a node requires in order to be able to join a cluster, such as cluster.name and network.host.

You can configure:


Important Elasticsearch configuration
are mostly settings which need to be considered before going into production:

You have 2 options to index the data into Elasticsearch.

Load data by restoring index snapshot


Example with NYC restaurants

Enter your local Elasticsearch single-node cluster by entering the Docker container running it:

$ docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                                                                              NAMES
bab86b772b05        sebp/elk            "/usr/local/bin/star..."   2 hours ago         Up 2 hours          5044/tcp, 5601/tcp, 9200/tcp, 9300/tcp                                             infallible_saha
$ docker exec -i -t bab86b772b05 /bin/bash

Using the option to restore a snapshot involves 4 easy steps:

  1. Download and uncompress the index snapshot .tar.gz file into a local folder
# Create snapshots directory
mkdir elastic_restaurants
cd elastic_restaurants
# Download index snapshot to elastic_restaurants directory
wget http://download.elasticsearch.org/demos/nyc_restaurants/nyc_restaurants-5-4-3.tar.gz .
# Uncompress snapshot file
tar -xf nyc_restaurants-5-4-3.tar.gz

This adds a nyc_restaurants subfolder containing the index snapshots.

  1. Add nyc_restaurants dir to the path.repo variable in elasticsearch.yml in the <path_to_elasticsearch_root_dir>/config/ folder. See example here.. Restart elasticsearch for the change to take effect.

With Docker, any changes to your Elasticseach's configuration will be lost after a restart of the Docker container.

One solution is therefore to edit our Docker image before re-running it, then you can apply the changes mentioned above.

Register a file system repository for the snapshot (change the value of the "location" parameter below to the location of your restaurants_backup directory)

curl -H "Content-Type: application/json" -XPUT ‘http://localhost:9200/_snapshot/restaurants_backup' -d ‘{

"type": "fs",
"settings": {
    "location": "<path_to_nyc_restaurants>/",
    "compress": true,
    "max_snapshot_bytes_per_sec": "1000mb",
    "max_restore_bytes_per_sec": "1000mb"
}

}'

Process and load data using a Python script

example script