How to search for a part of a word with ElasticSearch

Question

I ve recently started using ElasticSearch and I can t seem to make it search for a part of a word   Example  I have three documents from my couchdb indexed in ElasticSearch         id     1      name     John Doeman      function     Janitor          id     2      name     Jane Doewoman      function     Teacher          id     3      name     Jimmy Jackal      function     Student       So now  I want to search for all documents containing  Doe   curl http   localhost 9200 my idx my type  search q Doe   That doesn t return any hits  But if I search for  curl http   localhost 9200 my idx my type  search q Doeman   It does return one document  John Doeman    I ve tried setting different analyzers and different filters as properties of my index  I ve also tried using a full blown query  for example         query          term            name    Doe                  But nothing seems to work   How can I make ElasticSearch find both John Doeman and Jane Doewoman when I search for  Doe     UPDATE  I tried to use the nGram tokenizer and filter  like Igor proposed  like this        index          index    my idx        type    my type        bulk size    100        bulk timeout    10ms        analysis            analyzer              my analyzer                type    custom              tokenizer    my ngram tokenizer              filter                  my ngram filter                                        filter              my ngram filter                type    nGram              min gram   1             max gram   1                           tokenizer              my ngram tokenizer                type    nGram              min gram   1             max gram   1                                 The problem I m having now is that each and every query returns ALL documents  Any pointers  ElasticSearch documentation on using nGram isn t great

User · Answer

Try the solution with is described here  Exact Substring Searches in ElasticSearch          mappings              my type                  index analyzer   index ngram                search analyzer   search ngram                        settings              analysis                  filter                      ngram filter                          type    ngram                        min gram   3                       max gram   8                                               analyzer                      index ngram                          type    custom                        tokenizer    keyword                        filter      ngram filter    lowercase                                        search ngram                          type    custom                        tokenizer    keyword                        filter    lowercase                                                      To solve the disk usage problem and the too-long search term problem short 8 characters long ngrams are used  configured with   max gram   8   To search for terms with more than 8 characters  turn your search into a boolean AND query looking for every distinct 8-character substring in that string  For example  if a user searched for large yard  a 10-character string   the search would be    arge ya AND arge yar AND rge yard

User · Answer

I am using this and got I worked    query              query string                   query      test                 fields      field1   field2                 analyze wildcard    true               allow leading wildcard   true

User · Answer

While there are a lot of answers which focuses on solving the issue at hand but don t talk much about the various trade-off which someone needs to make before choosing a particular answer  So let me try to add a few more details on this perspective  Partial search is now a day a very common and important feature and if not implemented properly can lead to poor user experience and bad performance  so first know your application function and non-function requirement related to this feature which I talked about in my this detailed SO answer  Now there are various approaches  like query time  index time  completion suggester and search as you type data-types added in recent version of elasticsarch  Now people who quickly want to just implement a solution can use below end to end working solution  Index mapping      quot settings quot          quot analysis quot            quot filter quot              quot autocomplete filter quot                quot type quot    quot ngram quot              quot min gram quot   1             quot max gram quot   10                           quot analyzer quot              quot autocomplete quot                 quot type quot    quot custom quot              quot tokenizer quot    quot standard quot              quot filter quot                  quot lowercase quot                quot autocomplete filter quot                                            quot index max ngram diff quot    10         quot mappings quot          quot properties quot            quot title quot              quot type quot    quot text quot            quot analyzer quot    quot autocomplete quot             quot search analyzer quot    quot standard quot                        Index given sample docs      quot title quot     quot John Doeman quot             quot title quot     quot Jane Doewoman quot             quot title quot     quot Jimmy Jackal quot        And search query        quot query quot              quot match quot                  quot title quot    quot Doe quot                     which returns expected search results   quot hits quot                                    quot  index quot    quot 6467067 quot                    quot  type quot    quot  doc quot                    quot  id quot    quot 1 quot                    quot  score quot   0 76718915                   quot  source quot                          quot title quot    quot John Doeman quot                                                                  quot  index quot    quot 6467067 quot                    quot  type quot    quot  doc quot                    quot  id quot    quot 2 quot                    quot  score quot   0 76718915                   quot  source quot                          quot title quot    quot Jane Doewoman quot

User · Answer

If you want to implement autocomplete functionality  then Completion Suggester is the most neat solution  The next blog post contains a very clear description how this works   In two words  it s an in-memory data structure called an FST which contains valid suggestions and is optimised for fast retrieval and memory usage  Essentially  it is just a graph  For instance  and FST containing the words hotel  marriot  mercure  munchen and munich would look like this

User · Answer

Nevermind   I had to look at the Lucene documentation  Seems I can use wildcards   -   curl http   localhost 9200 my idx my type  search q  Doe    does the trick

User · Answer

without changing your index mappings you could do a simple prefix query that will do partial searches like you are hoping for  ie        query           prefix       name     Doe            https   www elastic co guide en elasticsearch reference current query-dsl-prefix-query html

User · Answer

Searching with leading and trailing wildcards is going to be extremely slow on a large index  If you want to be able to search by word prefix  remove leading wildcard  If you really need to find a substring in a middle of a word  you would be better of using ngram tokenizer

User · Answer

Using wilcards     prevent the calc of a score

User · Answer

I m using nGram  too  I use standard tokenizer and nGram just as a filter  Here is my setup        index          index    my idx        type    my type        analysis            index analyzer              my index analyzer                type    custom              tokenizer    standard              filter                  lowercase                mynGram                                        search analyzer              my search analyzer                type    custom              tokenizer    standard              filter                  standard                lowercase                mynGram                                        filter              mynGram                type    nGram              min gram   2             max gram   50                                 Let s you find word parts up to 50 letters  Adjust the max gram as you need  In german words can get really big  so I set it to a high value

User · Answer

I think there s no need to change any mapping  Try to use query string  it s perfect  All scenarios will work with default standard analyzer   We have data      id     1   name     John Doeman   function     Janitor      id     2   name     Jane Doewoman   function     Teacher     Scenario 1     query          query string      default field     name    query      Doe          Response      id     1   name     John Doeman   function     Janitor      id     2   name     Jane Doewoman   function     Teacher     Scenario 2     query          query string      default field     name    query      Jan          Response      id     1   name     John Doeman   function     Janitor     Scenario 3     query          query string      default field     name    query      oh   oe          Response      id     1   name     John Doeman   function     Janitor      id     2   name     Jane Doewoman   function     Teacher     EDIT - Same implementation with spring data elastic search  https   stackoverflow com a 43579948 2357869  One more explanation how query string is better than others https   stackoverflow com a 43321606 2357869

User · Answer

you can use regexp       id     1    name     John Doeman     function     Janitor       id     2    name     Jane Doewoman   function     Teacher         id     3    name     Jimmy Jackal    function     Student        if you use this query         query          regexp            name    J                  you will given all of data that their name start with  J  Consider you want to receive just the first two record that their name end with  man  so you can use this query         query           regexp            name      man                and if you want to receive all record that in their name exist  m     you can use this query         query           regexp            name      m                  This works for me  And I hope my answer be suitable for solve your problem

[elasticsearch] How to search for a part of a word with ElasticSearch

Examples related to elasticsearch