2017 Mar 09

An introductory look at getting setup with Apache Solr. Also a look at comparing it with Elasticsearch and how it might fit into a Drupal website.

There's been a lot of talk about Elasticsearch as an advanced search solution, and rightly so. It's a great way to use your data in a more advanced way by taking advantage of its underlying technology called Apache Lucene. Lucene is a full-featured text search engine library built in Java. One other comparable solution exists called "Apache Solr" which lives under the Apache Software Foundation. It too uses Lucene under the hood. One might then ask themselves, which one is better (Solr or Elasticsearch) and which one should I use? 

First, it's important to know that Lucene on its own is basically a search powerhouse. It can be used on its own but is generally cumbersome to do so. There is a good analogy that outlines Lucene as being like a car engine, and solutions like Elasticsearch and Solr being cars. You can drive a car but not an engine. When using these tools that sit on top of Lucene, they provide a much better developer experience as well as offering features not native to Lucene.

There are dedicated websites that outline the differences between Solr and Elasticsearch as well as good resources which talk about what it is and how it's different from using Lucene directly vs with a solution like Solr or Elasticsearch.

Recently we took a look at Solr having already used Elasticsearch for much of our existing search work. This will be a general documentation of that setup and experience. First we'll look at setting up Solr as a service, then briefly look at what it looks like from a Drupal perspective in order to use Solr with a Drupal site. The following instructions are tested running on a server with Centos 7 64-bit Linux.

Installing Solr and setting it up as a service

Installing Java

sudo yum install java-1.8.0-openjdk.x86_64

Once that's installed, we can then get the latest version of Solr. There are a bunch of mirrors you can get it from as seen on the Solr downloads page.

The versions are changing all the time. As of this writing they're at 6.6.5. Depending on the current version, you'll want to adjust the version for the instructions below.

Getting Solr

wget http://apache.forsale.plus/lucene/solr/6.6.5/solr-6.6.5.tgz

Extract the install shell script only from the compressed archive. This was key in setting up and wasn't really documented so it's being pointed out here.
tar xzf solr-6.6.5.tgz solr-6.6.5/bin/install_solr_service.sh --strip-components=2

Install Solr

If you already have it installed, pass in the -f flag last to upgrade versions
sudo bash ./install_solr_service.sh solr-6.6.5.tgz -f

Enable it as a service
sudo systemctl enable solr

NOTE, you may get an error with this depending on whether or not Solr is still using the SysV init.d scripts. To get it to work with Systemd you can type this:

sudo systemctl daemon-reload
sudo systemctl enable solr

Remove the download file
rm solr-6.6.5.tgz

The Solr interface

Now that we have Solr installed and running as a service, we can visit its provided admin interface at localhost:8983/solr. This admin interface is one big difference that any user of Elasticsearch will notice. Elasticsearch does have other tools to administer itself but none come with it by default. Here in Solr, we get a landing page with basic system info, version information, logging and a page to view Solr "cores".

The provided solr admin interface
The provided solr admin interface on port 8983

Creating Solr "cores"

In Solr, a core is composed of a set of configuration files, Lucene index files, and Solr’s transaction log. Essentially they are close in comparison to an index in Elasticsearch. This will have to be done with a command using the Solr app directly rather than as a REST call using the API. When installing Solr originally, it had also created a "solr" user which we'll need to use in order to create the new core without getting permission issues. Here we'll just create a new core called "demo".

sudo -u solr /opt/solr/bin/solr create -c demo

Using Solr 

There are a variety of ways to interact with Solr depending on which tools you're using. At its core it's API is REST based but there are a number of tools that sit in front of this API that are available for many languages. Since we were mainly researching Solr as part of a Drupal site, we were using it with the Search API and Search API Solr Search modules that are available for Drupal. The latter module makes use of a popular client library for Solr written in PHP called Solarium. You're typically going to see this setup with Elasticsearch as well. Instead of using the API directly, you would use something more abstract that does the heavy lifting. For Elasticsearch, we've also used Search API in Drupal with the Elasticsearch Connector module. That module then uses the official PHP client library that Elasticsearch maintains to do the talking to the API. So it's the same basic setup.

Query Syntax

One other thing that might appeal to someone looking at Solr, is its query syntax. It feels as though you're able to write queries that appear to be more "english" and straightforward than in Elasticsearch. This is close to what Lucene uses itself as seen in the documentation:
https://lucene.apache.org/core/2_9_4/queryparsersyntax.html

More examples can be found here
http://www.solrtutorial.com/solr-query-syntax.html

Elasticsearch does have this ability as well with their "Query String Query" syntax. However, it's not typically the way you'd choose in Elasticsearch. The more robust solution is to use their other more specific match type structured queries using JSON objects (or PHP arrays in the case of using the PHP client). There are good articles outlining some of the differences in Solr and Elasticsearch Query DSLs that can be found online.

Conclusion

That's a basic primer for installing and using Solr. More specifically, it's worth mentioning here some Drupal specific things since Solr has become very popular in that realm. Given the same basic setup as Elasticsearch, Solr is basically the same from a "backend" indexing point of view. You'll still be able to install the Search API and Search API Solr Search modules and be running. The second of those two modules currently uses the solarium PHP client library mentioned earlier to actually interface with Solr.

Finally, if the question still remains of which to use, Elasticsearch or Solr, well there isn't much difference between the two. The comparison website mentioned earlier (Apache Solr vs Elasticsearch), does call out many points, but none too many that are glaring in difference. One non-technical difference that can be pointed out is the nature of their "open sourceness". Both are released under the Apache license, and both are open source, but they work a little differently. Solr is truly open source - anyone can help and contribute. With Elasticsearch, people can still offer their contributions, but only elastic.co employees (the company behind Elasticsearch and the Elastic products) can accept those contributions. 

Resources