A quick guide to Elasticsearch Java clients [Part 3]
In previous blog posts (part 1, part 2), we’ve seen some basic features of Jest and Spring data elasticsearch clients, and in this third and final part, we’ll highlight some of the features of the official Elasticsearch High REST API client and give an overall conclusion for entire blog post series.
Official Elasticsearch REST API clients
Before we go into the Elasticsearch REST API clients, we will quickly mention native clients.
A native client doesn’t use REST API, but instead, the whole elasticsearch application is added as a dependency into the client application, and the application becomes a node that joins the cluster. This node can be configured not to be the master node, and not hold any data. Because the application becomes part of the cluster, it is aware of its state and knows to which node to go for data. This makes this setup good in terms of performance. Some disadvantages of this approach can be seen when we want to horizontally scale our application, then we could potentially end up with a bunch of nodes that are joined into a cluster, but hold no data. Other than that TransportClient which is used to retrieve native client instance, is to be deprecated starting with version 7 of Elasticsearch, and be completely removed by version 8, according to the documentation
Another option is to use one of the REST clients. We’ll showcase here usage of Java High-Level REST Client.
There are two types of Java REST Client, Low-Level REST Client, and High Level REST Client. The low-level client communicates with Elasticsearch server through HTTP, internally using Apache Http Async Client to issue HTTP requests, but leaves the rest of the work like (un)marshaling to the developer. On the other hand, a high-level client is built on top of a low-level client, and exposes REST API methods, and takes care of (un)marshaling work.
Java High Level REST Client requires minimally Java 1.8 and to have Elasticsearch core dependency on classpath. To use Java High Level REST Client following dependencies need to be added to project:
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>6.2.2</version>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>6.2.2</version>
</dependency>
For Maven projects in pom.xml, or in build.gradle for Gradle projects:
compile(‘org.elasticsearch.client:elasticsearch-rest-high-level-client:6.2.2’)
compile(‘org.elasticsearch:elasticsearch:6.2.2’)
Configuring high level client is pretty straight forward:
RestHighLevelClient esClient = new RestHighLevelClient(RestClient.builder(new HttpHost(“127.0.0.1”, 9200, “http”)));
High-level client needs to be explicitly closed, so all underlying resources are released. This could be done by calling esClient.close(); when the client is no longer needed.
To do this, we have few options, for e.g. use try-with-resources and create new instance of client every time that we need it, which gets automatically closed after try block is executed:
try(RestHighLevelClient esClient = new RestHighLevelClient(RestClient.builder(new HttpHost(“127.0.0.1”, 9200, “http”)))) {
// do something with the client
// client gets closed automatically
} catch(IOException e) {
; // log any errors
}
Or if client is Spring Boot application, we could create client once as a @Bean, and then one of the ways to do close it, is to create spring @Component with @PreDestroy annotated method in which esClient.close() gets called, for e.g.:
@PreDestroy
public void closeEsClient() {
try {
esClient.close();
} catch (IOException e) {
System.out.println(“Error while closing elasticsearch client”);
}
}
This should (there are some scenarios when this won’t work, but this topic is out of the scope) at least, make sure to close the client when the application is gracefully shutdown or a kill signal is sent. This is because Spring Boot automatically registers singleton methods annotated with @PreDestroy as shutdown hooks.
Now with that out of the way, we can create index:
CreateIndexRequest request = new CreateIndexRequest(“comment”);
CreateIndexResponse createIndexResponse = esClient.indices().create(request);
This is synchronous request, there is also async alternative createAsync() to which besides request you need to supply listener implementation describing how to handle response or failure scenarios when the response is available.
Using createIndexResponse we can verify whether or not all nodes acknowledged request: createIndexResponse.isAcknowledged().
Similarly, we can issue requests to open – using OpenIndexRequest, close – using CloseIndexRequest and delete – using DeleteIndexRequest our index, and verify acknowledgment from appropriate responses.
If we imagine that we have an instance of simple POJO like:
public final class Comment {
private String name;
private String message;
// Constructors, getters, setters, and other methods …
}
That we want to index, one of the ways to do that is to include mapping processor like Jackson in classpath of our project by adding:
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.9.4</version>
</dependency>
in pom.xml for Maven project, or in build.gradle for Gradle projects:
compile(‘com.fasterxml.jackson.core:jackson-core:2.9.4’)
and to index Comment instance like this:
Comment comment = new Comment(“user1”, “This is test comment”);
ObjectMapper mapper = new ObjectMapper();
String stringToIndex = mapper.writeValueAsSting(comment);
IndexRequest request = new IndexRequest(stringToIndex, “comment”);
request.source(stringToIndex, XContentType.JSON);
IndexResponse response = client.index(request);
if(response.status() == RestStatus.CREATED) {
System.out.println(“Index created”);
}
Now we can search for all comments and retrieve all hits:
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.timeout(new TimeValue(600, TimeUnit.SECONDS)); // Request timeout
sourceBuilder.from(0);
sourceBuilder.sort(new ScoreSortBuilder().order(SortOrder.DESC)); //Result set ordering
BoolQueryBuilder query = new BoolQueryBuilder();
query.must(new MatchQueryBuilder(“user”, “user1”));
query.must(new MatchQueryBuilder(“message”, “This is test comment”));
sourceBuilder.query(query);
SearchRequest searchRequest = new SearchRequest(“comment”);
searchRequest.source(sourceBuilder);
final SearchResponse searchResponse = esClient.search(searchRequest);
SearchHits hits = searchResponse.getHits();
BooleanQueryBuilder from the example above will produce the following search query (shown only with important fields for this example):
{
“bool” : {
“must” : [
{
“match” : {
“user” : {
“query” : “user1”
}
}
},
{
“match” : {
“message” : {
“query” : “This is test comment”
}
}
}
]
}
}
If there are many results hits, it is a good idea to retrieve them paginated. This is achieved by using elasticsearch scrolls. So if we want to search for all comment messages for user1, for e.g.:
{
“bool” : {
“must” : [
{
“match” : {
“user” : { “query” : “user1” }
}
}
]
}
}
First we need to setup search request similarly to example above but with the few additions (note additional settings highlighted):
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.timeout(new TimeValue(600, TimeUnit.SECONDS));
sourceBuilder.from(0);
sourceBuilder.size(10); // Size of result hits in scroll
sourceBuilder.sort(new ScoreSortBuilder().order(SortOrder.DESC));
BoolQueryBuilder query = new BoolQueryBuilder();
query.must(new MatchQueryBuilder(“user”, “user1”));
sourceBuilder.query(query);
SearchRequest searchRequest = new SearchRequest(“comment”);
searchRequest.source(sourceBuilder);
searchRequest.scroll(TimeValue.timeValueSeconds(600)); // Keep scroll alive
Then we need to issue initial request to retrieve id of the scroll:
SearchResponse searchResponse = esClient.search(searchRequest);
SearchHits hits = searchResponse.getHits();
String scrollId = searchResponse.getScrollId();
After that we could retrieve all scrolls by passing scroll id to subsequent requests:
while(searchHits != null && searchHits.getHits().length > 0) {
final SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
scrollRequest.scroll(TimeValue.timeValueSeconds(600));
final SearchResponse searchResponse = esClient.searchScroll(scrollRequest);
final SearchHits searchHits = searchResponse.getHits();
assertNotNull(searchHits);
}
After all search hits are retrieved, we need to make sure to close the scroll:
ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
clearScrollRequest.addScrollId(scrollId);
ClearScrollResponse clearScrollResponse = esClient.clearScroll(clearScrollRequest);
if (!clearScrollResponse.isSucceeded()) {
System.out.println(“Could not close scroll with scroll id: ” + scrollId);
}
Besides query builders already shown, like BoolQueryBuilder and MatchQueryBuilder, there are many more compound and full-text query builders alongside other types of builders (term level, joining …), which enables the building of sophisticated searches.
Java High-Level REST Client enables Java developers to easily do both basic and complex operations against Elasticsearch REST API, versions follow Elasticsearch server versions making migrations that much easier, and if client application jar size and memory footprint is not that critical, presents really strong candidate when considering Java clients.
Summary
In this blog post series, we’ve seen how to install a local instance of Elasticsearch server, configure it and run it. Also, we saw what are the options available in regards to Java clients, and some advantages and drawbacks for each of them. For each of the clients presented, small code snippets are presented, highlighting some of the features, like the configuration of the client, and usages of main features like indexing, searching, getting, and deleting.
Just a small example was presented here to get you quickly started on working with Elasticsearch features. Unit and integration testing are also things to keep in mind when developing elasticsearch client applications. There are also many more sophisticated configuration options for indexing and fine-tuning searches. To further expand upon those topics, the reader is encouraged to read through the documentation.
Links and references:
- Official Elasticsearch REST clients documentation – https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html
- Elasticsearch REST clients documentation – https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/index.html
- Apache Http Async Client – https://hc.apache.org/httpcomponents-asyncclient-dev/index.html
- Jackson mapper documentation – https://github.com/FasterXML/jackson-core/wiki
- Elasticsearch query builders – https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-query-builders.html
- Elasticsearch testing framework – https://www.elastic.co/guide/en/elasticsearch/reference/current/testing-framework.html
A quick guide to ElasticSearch Java Clients – part 1
A quick guide to ElasticSearch Java Clients – part 2
About the author:
Dragan Torbica is a Software Engineer with 7 years of experience mostly in Java and Spring. He believes that software is not something that can be manufactured nor can it be delivered faster by merely adding more lines of code and that writing good software requires skill and careful attention. Currently, he’s working for BrightMarbles.io as Senior Software Developer and Technical Team Lead.