All this power is bound to make souls
wanting to experiment with it a little queasy. Specially if you have
no background in data mining, or any experience with lucene, it can
take longer than usual to get things rolling. I fall in that very
category. Moreover, the Java api doc is scanty and leaves much to
imagination.
If elasticsearch teases you too, then
read on, maybe it'll make the road smoother for you.
As we go along glancing over some
basics, we'll build a small java application (part II of this post soon) to index some documents
into elasticsearch and test our index using curl. Once this is up and
running, you can build on from there and father/mother a beautiful
little search application.
Setting up elastic search is fairly
easy. Download and dump! I'll use windows! It'll work the same with Linux/Mac, whatever your poison is. You can pick up the bundle from here.
Basic Configuration
ES comes pre packaged with some basic
configuration that's good to go even for a simple production deployment, so you can either not touch the config file at all, or
tweak some options just for fun!
The config file elasticsearch.yml is
nested under $ES_HOME\config. ($ES_HOME points to your ES root folder) It is pretty much self explanatory once
you start reading through it. Top down, you'll encounter the
following sections:-
I'll add another post soon about building a small application using the java api of ES. Do let me know if you find any loops in this post or if you have any questions!
Cluster
If you are on a network, or even
locally if you are running multiple projects having their own
elasticsearch setup, you might want to change the default clustername
from 'elasticsearch' to, say your favorite action movie name so that when
you start your own node, it does not merge into some already live
cluster running under the same default name.
cluster.name: savingprivateryan
Node
You can safely omit setting up the
node.name for the time being as ES will generate one for you
dynamically and most likely is a marvel comic hero name..or not. (We all have
our quirks!)
If you want this node to never be a
master which most likely means that you dont want it taking care of
the cluster maybe because it lacks processing power or you simply are
a tyrant, set
node.master: false
If stripping it off its power doesnt
satisfy your ego and you want it to never hold any data too, then
set
node.data: false
ES lets you start multiple nodes from
the same installation. You can set an upper limit to it if you want
node.max_local_storage_nodes: 1
Read the comments slapped all around
this section in the actual config file, and reading all this so far
would just seem redundant. But then again, didn't I tell you that it's
self explanatory!
Index
You can safely skip configuring this section! I say so because any settings done in this part would apply to all your
indices under this node. I am assuming you wouldn't want that. We can
set these properties individually for indices we create dynamically
using the java api. But its always good to know what these do!
index.number_of_shards : Number of
shards you'd want your index to break up into. These get distributed
to nodes as and when they join your cluster, so as to load balance.
For a local dev setup, you can set it to 1 since you'll have just
one machine and hopefully you wouldn't want to run multiple nodes under a cluster on the same machine.
index.number_of_replicas : Number of
copies of the entire index you want. This acts as a failsafe,
provides data redundancy. So if lets say you have 5 shards and 1
replicas, and you create an index. It'll get broken up into 5 shards,
and ES will replicate that entire shard bunch 1 times. You
effectively have 10 shards now, which you can load balance between 10 nodes; if you
decide to add more nodes to your cluster that is.
Do keep in mind that shard setting is a
one time setting per index, whether set through config file (gets applied
to all indices), or dynamically through the api (applied per index). Once set per index,
it can not be changed. However, number of replicas can still be
manipulated using the update settings api.
If unspecified, all indexes are created
with 5 shards, and 1 replicas by default.
Paths
Make path.data point to where you'd
want ES to store your indices. Do similarly for path.logs and
path.work. By default they are created inside your ES root folder. If
you plan on working with good amount of data, which you probably are,
point path.data to a location which you know has ample space.
Network & HTTP
If you want to do change something in this section of the file, then
you are probably looking at the wrong blog. This section lets you
change defaults of ports where ES listens for HTTP/ TCP requests,
which are 9300 and 9200 by default, also lets you bind a specific IP
to your current node.
The first node you'll create will use
these default ports. Any node created afterwards on the same host, will look for the
next available port starting from these defaults. i.e 9301 &
9201. Shutting down a node will free up these ports immediately.
Discovery
By default ES uses multicast to
discover nodes in a network and also to elect the master. If multicast
is disabled for some reason on your network, or you want to avoid the
unnecessary chatter caused by it, or you simply don't care because you
just want to run it locally, set
discovery.zen.ping.multicast.enabled: false
Nodes you'll add would still need to discover each
other for a properly functioning cluster, so do enable unicast and
specify what all nodes will be used for the discovery process. For
local setup, point discovery.zen.ping.unicast.hosts to localhost.
discovery.zen.ping.unicast.hosts: ["localhost"]
As you can see it accepts an array of
nodes, just in case you want more nodes to participate.
We've skipped some sections of the
configuration, as they deal with advanced options and this is only an
initiation for a novice user. We're not going to configure anything
in our setup as there is no major need to tweak anything. However, i
do suggest you play around a little with these settings to get a hang
of it.
I'll add another post soon about building a small application using the java api of ES. Do let me know if you find any loops in this post or if you have any questions!
No comments:
Post a Comment