Pages

Saturday, June 1, 2013

Querying elasticsearch

In the previous post on 'Indexing into elasticsearch', we tested our index using curl as a quick alternative. Lets try to replicate the four curl GET operations we used back there, using the java api of ES. We’ll use the same superheroes data we used in the previous post. Just a quick glance through it.


Captain America,19400101,chicken biryaani
Hulk,20050420,kadai paneer
Shaktimaan,19801210,parle G
Ghost Rider,20011010,chicken biryaani

Elasticsearch’s query DSL is very powerful, now even more with its new release based on lucene 4. It provides a QueryBuilders factory which prepares our query builders and offers them on a platter.


Lets get the boilerplate code out of our way

public class Retriever {

  private TransportClient client;
 
  public Retriever(){
    client = new TransportClient(ImmutableSettings
      .settingsBuilder()
      .put("cluster.name", "elasticsearch")
      .build());
 
      client.addTransportAddresses(new InetSocketTransportAddress("localhost", 9300));
  }
 
  public void executeQuery(String queryType, QueryBuilder query){
  
    SearchResponse response = client.prepareSearch()
              .setQuery(query).execute().actionGet();
  
    SearchHits hits = response.getHits();
    System.out.println(queryType);
    for (SearchHit hit : hits){
      System.out.println(hit.getSource());
    }
  }
 
  public static void main(String args[]) {
  
    Retriever retriever = new Retriever();
  
    // We'll put all our 4 queries here
  }
}

'executeQuery' is our helper method that takes a description of the query that we want to execute as a string argument and the query object to execute. This object is passed to the 'setQuery' method on the SearchRequestBuilder returned by 'prepareSearch'. To extract our documents we need to dig deep into the response object we get. This object is similar to the json returned by our curl GET operation. We extract the hits array from it, and iterate through it and pick out the json document from the source field using 'hit.getSource()'

Look at the following json response returned from a curl query on superheroes. This'll help in developing a clearer picture about the structure of the object we're probing to get to our matched documents.

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.8784157,
    "hits" : [ {
      "_index" : "myindex",
      "_type" : "superheroes",
      "_id" : "z5UluGHTQx2gj-FdIuPlrA",
      "_score" : 0.8784157,
      "_source" : {
        name : "Captain America",
        dob : "19400101",
        favFood : "chicken biryaani"
      }
    }]
  }
}

Java api with guns blazing

Here are the java equivalents of our curl GET operations from the previous post. They should return the same results upon execution.

// 1. Retrieve all documents
QueryBuilder queryAll = QueryBuilders.matchAllQuery();
retriever.executeQuery("Retrieve all documents", queryAll);
  
// 2. Find superheroes based on name (term query)
QueryBuilder termQuery = QueryBuilders.termQuery("name", "captain");
retriever.executeQuery("Find superheroes based on name (term query)", termQuery);
  
// 3. Find superheroes who like chicken biryaani (full text match)
QueryBuilder fullTextSearch = QueryBuilders.matchPhraseQuery("favFood", "chicken biryaani");
retriever.executeQuery("Find superheroes who like chicken biryaani (full text match)", fullTextSearch);
  
// 4. Find superheroes born in the 20th century (range query)
QueryBuilder rangeQuery = QueryBuilders.rangeQuery("dob").from(19000101).to(19991231);
retriever.executeQuery("Find superheroes born in the 20th century (range query)", rangeQuery);

Elasticsearch offers a variety of such builders which help in constructing rich flexible queries. You can read more about them here.

No comments:

Post a Comment