ElasticSearch Sphinx Lucene Solr Xapian Which fits for which usage

Question

I m currently looking at other search methods rather than having a huge SQL query  I saw elasticsearch recently and played with whoosh  a Python implementation of a search engine    Can you give reasons for your choice s

User · Answer

We use Lucene regularly to index and search tens of millions of documents. Searches are quick enough, and we use incremental updates that do not take a long time. It did take us some time to get here. The strong points of Lucene are its scalability, a large range of features and an active community of developers. Using bare Lucene requires programming in Java.

If you are starting afresh, the tool for you in the Lucene family is Solr, which is much easier to set up than bare Lucene, and has almost all of Lucene's power. It can import database documents easily. Solr are written in Java, so any modification of Solr requires Java knowledge, but you can do a lot just by tweaking configuration files.

I have also heard good things about Sphinx, especially in conjunction with a MySQL database. Have not used it, though.

IMO, you should choose according to:

The required functionality - e.g. do you need a French stemmer? Lucene and Solr have one, I do not know about the others.
Proficiency in the implementation language - Do not touch Java Lucene if you do not know Java. You may need C++ to do stuff with Sphinx. Lucene has also been ported into other languages. This is mostly important if you want to extend the search engine.
Ease of experimentation - I believe Solr is best in this aspect.
Interfacing with other software - Sphinx has a good interface with MySQL. Solr supports ruby, XML and JSON interfaces as a RESTful server. Lucene only gives you programmatic access through Java. Compass and Hibernate Search are wrappers of Lucene that integrate it into larger frameworks.

User · Answer

We use Sphinx in a Vertical Search project with 10 000 000   of MySql records and 10  different database   It has got very excellent support for MySQL and high performance on indexing   research is fast but maybe a little less than Lucene  However it s the right choice if you need quickly indexing every day and use a MySQL db

User · Answer

The only elasticsearch vs solr performance comparison I ve been able to find so far is here   Solr vs elasticsearch Deathmatch

User · Answer

Lucene is nice and all  but their stop word set is awful  I had to manually add a ton of stop words to StopAnalyzer ENGLISH STOP WORDS SET just to get it anywhere near usable   I haven t used Sphinx but I know people swear by its speed and near-magical  ease of setup to awesomeness  ratio

User · Answer

An experiment to compare ElasticSearch and Solr

User · Answer

As the creator of ElasticSearch  maybe I can give you some reasoning on why I went ahead and created it in the first place      Using pure Lucene is challenging  There are many things that you need to take care for if you want it to really perform well  and also  its a library  so no distributed support  it s just an embedded Java library that you need to maintain   In terms of Lucene usability  way back when  almost 6 years now   I created Compass  Its aim was to simplify using Lucene and make everyday Lucene simpler  What I came across time and time again is the requirement to be able to have Compass distributed  I started to work on it from within Compass  by integrating with data grid solutions like GigaSpaces  Coherence  and Terracotta  but it s not enough   At its core  a distributed Lucene solution needs to be sharded  Also  with the advancement of HTTP and JSON as ubiquitous APIs  it means that a solution that many different systems with different languages can easily be used   This is why I went ahead and created ElasticSearch  It has a very advanced distributed model   speaks JSON natively  and exposes many advanced search features  all seamlessly expressed through JSON DSL   Solr is also a solution for exposing an indexing search server over HTTP  but I would argue that ElasticSearch provides a much superior distributed model and ease of use  though currently lacking on some of the search features  but not for long  and in any case  the plan is to get all Compass features into ElasticSearch   Of course  I am biased  since I created ElasticSearch  so you might need to check for yourself   As for Sphinx  I have not used it  so I can t comment  What I can refer you is to this thread at Sphinx forum which I think proves the superior distributed model of ElasticSearch   Of course  ElasticSearch has many more features than just being distributed  It is actually built with a cloud in mind  You can check the feature list on the site

User · Answer

My sphinx conf  source post source        type   mysql      sql host   localhost     sql user           sql pass           sql db             sql port   3306      sql query pre   SET NAMES utf8       query before fetching rows to index      sql query   SELECT    id AS pid  CRC32 safetag  as safetag crc32 FROM hb posts       sql attr uint   pid         pid  as  sql attr uint   is necessary for sphinx       this field must be unique        that is why I like sphinx       you can store custom string fields into indexes  memory  as well     sql field string   title     sql field string   slug     sql field string   content     sql field string   tags      sql attr uint   category       integer fields must be defined as sql attr uint      sql attr timestamp   date       timestamp fields must be defined as sql attr timestamp      sql query info pre   SET NAMES utf8       if you need unicode support for sql field string  you need to patch the source       this param  is not supported natively      sql query info   SELECT   FROM my posts WHERE id    id    index posts        source   post source       source above      path    var data posts       index location      charset type   utf-8     Test script    lt  php      require  sphinxapi php         safetag     GET  my post slug         safetag   preg replace     a-z0-9 -   i        safetag         conf   getMyConf          cl   New SphinxClient          cl- gt SetServer  conf  server     conf  port          cl- gt SetConnectTimeout  conf  timeout          cl- gt setMaxQueryTime  conf  max            set search params      cl- gt SetMatchMode SPH MATCH FULLSCAN        cl- gt SetArrayResult TRUE         cl- gt setLimits 0  1  1          looking for the post  not searching a keyword        cl- gt SetFilter  safetag crc32   array crc32  safetag            fetch results      post    cl- gt Query null   post 1         echo   lt pre gt        var dump  post       echo   lt  pre gt        exit  done      gt    Sample result    array    gt      id    gt  123     title    gt   My post title       content    gt   My  lt p gt post lt  p gt  content                and other fields     Sphinx query time   0 001 sec    Sphinx query time  1k concurrent      gt  0 346 sec   average    gt  0 340 sec   average of last 10 query    MySQL query time    SELECT   FROM hb posts WHERE id   123     gt  0 001 sec    MySQL query time  1k concurrent     SELECT   FROM my posts WHERE id   123      gt  1 612 sec   average    gt  1 920 sec   average of last 10 query

User · Answer

Try indextank   As the case of elastic search  it was conceived to be much easier to use than lucene solr  It also includes very flexible scoring system that can be tweaked without reindexing

User · Answer

I have used Sphinx  Solr and Elasticsearch  Solr Elasticsearch are built on top of Lucene  It adds many common functionality  web server api  faceting  caching  etc   If you want to just have a simple full text search setup  Sphinx is a better choice    If you want to customize your search at all  Elasticsearch and Solr are the better choices  They are very extensible  you can write your own plugins to adjust result scoring    Some example usages    Sphinx  craigslist org Solr  Cnet  Netflix  digg com Elasticsearch  Foursquare  Github

[solr] ElasticSearch, Sphinx, Lucene, Solr, Xapian. Which fits for which usage?

Examples related to solr

Examples related to lucene

Examples related to elasticsearch

Examples related to sphinx

Examples related to xapian