Elasticsearch is a powerful search engine based on the Apache Lucene library
This tutorial is about learning the basics of Elasticsearch, possibly going through all the topics one need to know to pass the certification exam (Elasticsearch Engineer I & II).
This tutorial doesn't exactly follow the content of the official training, but you will learn to run Elasticsearch on a small single-node cluster, running locally for free on your laptop.
More specifically, you will learn to do the following:
Looking for the official Elastic training and certifications, you should check these links:
https://training.elastic.co/instructor-led-training/ElasticsearchEngineerI-Virtual
https://training.elastic.co/instructor-led-training/ElasticsearchEngineerII-Virtual
Normal steps to install a single-node Elastic environment would include:
<path_to_elasticsearch_root_dir>/bin/elasticsearch
<path_to_kibana_root_dir>/bin/kibana
Described below is another way to run Elasticsearch from within a container using Docker. This allows for a simpler, cleaner, (but temporary!!) installation of the Elastic stack which makes it practical for learning purposes.
On Linux, use sysctl vm.max_map_count
on the host to view the current value, and see Elasticsearch's documentation on virtual memory for guidance on how to change this value. Note that the limits must be changed on the host; they cannot be changed from within a container.
~$ sysctl vm.max_map_count
vm.max_map_count = 65530
On Linux, you can increase the limits by running the following command as root
:
sysctl -w vm.max_map_count=262144
vm.max_map_count
setting in /etc/sysctl.conf
sysctl vm.max_map_count
.Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. Containers allow a developer to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package.
By doing so, thanks to the container, the developer can rest assured that the application will run on any other Linux machine regardless of any customized settings that machine might have that could differ from the machine used for writing and testing the code.
This section describes how to use the sebp/elk
Docker image, which provides a convenient centralised log server and log management web interface, by packaging Elasticsearch, Logstash, and Kibana, collectively known as ELK.
This type of installation is recommended to get started, but you might be limited later on when you need to configure Elasticsearch and restart it to apply the changes. So if you need to change your configuration, you will have to (re-)build your Docker image locally.
The RPM and Debian packages will configure this setting automatically. No further configuration is required.
To pull the image from the Docker registry, open a shell prompt and enter:
docker pull sebp/elk
or this one if you haven't configured docker for your own user:
sudo docker pull sebp/elk
Run a container from the image with the following command:
docker run -p 5601:5601 -p 9200:9200 -p 5044:5044 -it --name elk sebp/elk
Note - The whole ELK stack will be started. See the Starting services selectively section to selectively start part of the stack.
This command publishes the following ports, which are needed for proper operation of the ELK stack:
5601
(Kibana web interface): http://localhost:56019200
(Elasticsearch JSON interface): http://localhost:9200/5044
(Logstash Beats interface, receives logs from Beats such as Filebeat – see the Forwarding logs with Filebeat section).This procedure is of course more cumbersome than pulling a pre-made docker image, but this will allow us to tweak the configuration of our Elasticsearch instance.
Here we will:
>> Cloning the official image:
/$ cd /tmp
/tmp$ git clone https://github.com/spujadas/elk-docker.git
>> Make your changes to the configuration of Elasticsearch:
...
coming example to be copied from the next section...
>> Build and run the ELK containers from that image:
You may need to remove any former container with the image name ‘elk' (maybe not necessary to check!!!!)
/tmp$ cd elk-docker/
/tmp/elk-docker$ docker-compose build elk
/tmp/elk-docker$ docker-compose up
Go take a cup of coffee or 3 :coffee::coffee::coffee:, it might take 5-10 minutes!
Elasticsearch ships with good defaults and requires very little configuration. Most settings can be changed on a running cluster using the Cluster Update Settings API.
The configuration files should contain settings which are node-specific (such as node.name and paths), or settings which a node requires in order to be able to join a cluster, such as cluster.name
and network.host
.
You can configure:
jvm.otions
log4j2.properties
elasticsearch.yml
Important Elasticsearch configuration are mostly settings which need to be considered before going into production:
You have 2 options to index the data into Elasticsearch.
Enter your local Elasticsearch single-node cluster by entering the Docker container running it:
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bab86b772b05 sebp/elk "/usr/local/bin/star..." 2 hours ago Up 2 hours 5044/tcp, 5601/tcp, 9200/tcp, 9300/tcp infallible_saha
$ docker exec -i -t bab86b772b05 /bin/bash
Using the option to restore a snapshot involves 4 easy steps:
# Create snapshots directory
mkdir elastic_restaurants
cd elastic_restaurants
# Download index snapshot to elastic_restaurants directory
wget http://download.elasticsearch.org/demos/nyc_restaurants/nyc_restaurants-5-4-3.tar.gz .
# Uncompress snapshot file
tar -xf nyc_restaurants-5-4-3.tar.gz
This adds a nyc_restaurants
subfolder containing the index snapshots.
nyc_restaurants
dir to the path.repo
variable in elasticsearch.yml
in the <path_to_elasticsearch_root_dir>/config/
folder. See example here.. Restart elasticsearch for the change to take effect.With Docker, any changes to your Elasticseach's configuration will be lost after a restart of the Docker container.
One solution is therefore to edit our Docker image before re-running it, then you can apply the changes mentioned above.
Register a file system repository for the snapshot (change the value of the "location" parameter below to the location of your restaurants_backup directory)
curl -H "Content-Type: application/json" -XPUT ‘http://localhost:9200/_snapshot/restaurants_backup' -d ‘{
"type": "fs",
"settings": {
"location": "<path_to_nyc_restaurants>/",
"compress": true,
"max_snapshot_bytes_per_sec": "1000mb",
"max_restore_bytes_per_sec": "1000mb"
}
}'