How to use Elasticsearch with MongoDB

Question

I have gone through many blogs and sites about configuring Elasticsearch for MongoDB to index Collections in MongoDB but none of them were straightforward   Please explain to me a step by step process for installing elasticsearch  which should include    configuration  run in the browser   I am using Node js with express js  so please help accordingly

User · Answer

Here how to do this on mongodb 3 0  I used this nice blog   Install mongodb  Create data directories       mkdir RANDOM PATH node1   mkdir RANDOM PATH node2 gt     mkdir RANDOM PATH node3     Start Mongod instances      mongod --replSet test --port 27021 --dbpath node1   mongod --replSet test --port 27022 --dbpath node2   mongod --replSet test --port 27023 --dbpath node3     Configure the Replica Set       mongo config     id   test   members      id  0  host   localhost 27021      id  1  host   localhost 27022          rs initiate config       Installing Elasticsearch      a  Download and unzip the  latest Elasticsearch  2  distribution  b  Run bin elasticsearch to start the es server   c  Run curl -XGET http   localhost 9200  to confirm it is working      Installing and configuring the MongoDB River         bin plugin --install   com github richardwilly98 elasticsearch elasticsearch-river-mongodb        bin plugin --install elasticsearch elasticsearch-mapper-attachments    Create the    River    and the Index       curl -XPUT  http   localhost 8080  river mongodb  meta  -d              type    mongodb             mongodb                   db    mydb                 collection    foo                      index                  name    name                 type    random                         Test on browser    http   localhost 9200  search q home

User · Answer

Using river can present issues when your operation scales up  River will use a ton of memory when under heavy operation  I recommend implementing your own elasticsearch models  or if you re using mongoose you can build your elasticsearch models right into that or use mongoosastic which essentially does this for you   Another disadvantage to Mongodb River is that you ll be stuck using mongodb 2 4 x branch  and ElasticSearch 0 90 x  You ll start to find that you re missing out on a lot of really nice features  and the mongodb river project just doesn t produce a usable product fast enough to keep stable  That said Mongodb River is definitely not something I d go into production with  It s posed more problems than its worth  It will randomly drop write under heavy load  it will consume lots of memory  and there s no setting to cap that  Additionally  river doesn t update in realtime  it reads oplogs from mongodb  and this can delay updates for as long as 5 minutes in my experience   We recently had to rewrite a large portion of our project  because its a weekly occurrence that something goes wrong with ElasticSearch  We had even gone as far as to hire a Dev Ops consultant  who also agrees that its best to move away from River   UPDATE  Elasticsearch-mongodb-river now supports ES v1 4 0 and mongodb v2 6 x  However  you ll still likely run into performance problems on heavy insert update operations as this plugin will try to read mongodb s oplogs to sync  If there are a lot of operations since the lock or latch rather  unlocks  you ll notice extremely high memory usage on your elasticsearch server  If you plan on having a large operation  river is not a good option  The developers of ElasticSearch still recommend you to manage your own indexes by communicating directly with their API using the client library for your language  rather than using river  This isn t really the purpose of river  Twitter-river is a great example of how river should be used  Its essentially a great way to source data from outside sources  but not very reliable for high traffic or internal use   Also consider that mongodb-river falls behind in version  as its not maintained by ElasticSearch Organization  its maintained by a thirdparty  Development was stuck on v0 90 branch for a long time after the release of v1 0  and when a version for v1 0 was released it wasn t stable until elasticsearch released v1 3 0  Mongodb versions also fall behind  You may find yourself in a tight spot when you re looking to move to a later version of each  especially with ElasticSearch under such heavy development  with many very anticipated features on the way  Staying up on the latest ElasticSearch has been very important as we rely heavily on constantly improving our search functionality as its a core part of our product   All in all you ll likely get a better product if you do it yourself  Its not that difficult  Its just another database to manage in your code  and it can easily be dropped in to your existing models without major refactoring

User · Answer

Here I found another good option to migrate your MongoDB data to Elasticsearch  A go daemon that syncs mongodb to elasticsearch in realtime  Its the Monstache  Its available at    Monstache  Below the initial setp to configure and use it   Step 1   C  Program Files MongoDB Server 4 0 bin gt mongod --smallfiles --oplogSize 50 --replSet test   Step 2    C  Program Files MongoDB Server 4 0 bin gt mongo  C  Program Files MongoDB Server 4 0 bin gt mongo MongoDB shell version v4 0 2 connecting to  mongodb   127 0 0 1 27017 MongoDB server version  4 0 2 Server has startup warnings  2019-01-18T16 56 44 931 0530 I CONTROL   initandlisten  2019-01-18T16 56 44 931 0530 I CONTROL   initandlisten     WARNING  Access control is not enabled for the database  2019-01-18T16 56 44 931 0530 I CONTROL   initandlisten              Read and write access to data and configuration is unrestricted  2019-01-18T16 56 44 931 0530 I CONTROL   initandlisten  2019-01-18T16 56 44 931 0530 I CONTROL   initandlisten     WARNING  This server is bound to localhost  2019-01-18T16 56 44 931 0530 I CONTROL   initandlisten              Remote systems will be unable to connect to this server  2019-01-18T16 56 44 931 0530 I CONTROL   initandlisten              Start the server with --bind ip  lt address gt  to specify which IP 2019-01-18T16 56 44 931 0530 I CONTROL   initandlisten              addresses it should serve responses from  or with --bind ip all to 2019-01-18T16 56 44 931 0530 I CONTROL   initandlisten              bind to all interfaces  If this behavior is desired  start the 2019-01-18T16 56 44 931 0530 I CONTROL   initandlisten              server with --bind ip 127 0 0 1 to disable this warning  2019-01-18T16 56 44 931 0530 I CONTROL   initandlisten  MongoDB Enterprise test PRIMARY gt    Step 3   Verify the replication   MongoDB Enterprise test PRIMARY gt  rs status               set     test            date    ISODate  2019-01-18T11 39 00 380Z             myState    1           term    NumberLong 2            syncingTo                 syncSourceHost                 syncSourceId    -1           heartbeatIntervalMillis    NumberLong 2000            optimes                       lastCommittedOpTime                               ts    Timestamp 1547811537  1                            t    NumberLong 2                                      readConcernMajorityOpTime                               ts    Timestamp 1547811537  1                            t    NumberLong 2                                      appliedOpTime                               ts    Timestamp 1547811537  1                            t    NumberLong 2                                      durableOpTime                               ts    Timestamp 1547811537  1                            t    NumberLong 2                                        lastStableCheckpointTimestamp    Timestamp 1547811517  1            members                                                  id    0                           name     localhost 27017                            health    1                           state    1                           stateStr     PRIMARY                            uptime    736                           optime                                       ts    Timestamp 1547811537  1                                    t    NumberLong 2                                                      optimeDate    ISODate  2019-01-18T11 38 57Z                             syncingTo                                 syncSourceHost                                 syncSourceId    -1                           infoMessage                                 electionTime    Timestamp 1547810805  1                            electionDate    ISODate  2019-01-18T11 26 45Z                             configVersion    1                           self    true                           lastHeartbeatMessage                                             ok    1           operationTime    Timestamp 1547811537  1             clusterTime                       clusterTime    Timestamp 1547811537  1                    signature                               hash    BinData 0  AAAAAAAAAAAAAAAAAAAAAAAAAAA                              keyId    NumberLong 0                                MongoDB Enterprise test PRIMARY gt    Step 4      Download the  https   github com rwynn monstache releases       Unzip the download and adjust your PATH variable to include the path to the folder for your platform      GO to cmd and type  monstache -v        4 13 1     Monstache uses the TOML format for its configuration  Configure the file for migration named config toml  Step 5    My config toml --   mongo-url    mongodb   127 0 0 1 27017  replicaSet test  elasticsearch-urls     http   localhost 9200    direct-read-namespaces      admin users     gzip   true stats   true index-stats   true  elasticsearch-max-conns   4 elasticsearch-max-seconds   5 elasticsearch-max-bytes   8000000   dropped-collections   false dropped-databases   false  resume   true resume-write-unsafe   true resume-name    default  index-files   false file-highlighting   false verbose   true exit-after-direct-reads   false  index-as-update true index-oplog-time true   Step 6   D  15-1-19 gt monstache -f config toml

User · Answer

Since mongo-connector now appears dead  my company decided to build a tool for using Mongo change streams to output to Elasticsearch   Our initial results look promising  You can check it out at https   github com electionsexperts mongo-stream  We re still early in development  and would welcome suggestions or contributions

User · Answer

River is a good solution once you want to have a almost real time synchronization and general solution   If you have data in MongoDB already and want to ship it very easily to Elasticsearch like  one-shot  you can try my package in Node js https   github com itemsapi elasticbulk    It s using Node js streams so you can import data from everything what is supporting streams  i e  MongoDB  PostgreSQL  MySQL  JSON files  etc   Example for MongoDB to Elasticsearch   Install packages   npm install elasticbulk npm install mongoose npm install bluebird   Create script i e  script js    const elasticbulk   require  elasticbulk    const mongoose   require  mongoose    const Promise   require  bluebird    mongoose connect  mongodb   localhost your database name       useMongoClient  true      mongoose Promise   Promise   var Page   mongoose model  Page   new mongoose Schema     title  String    categories  Array      your collection name        stream query  var stream   Page find       title  1   id  0  categories  1   limit 1500000  skip 0  batchSize 500  stream     elasticbulk import stream      index   my index name     type   my type name     host   localhost 9200       then function res      console log  Importing finished         Ship your data   node script js   It s not extremely fast but it s working for millions of records  thanks to streams

User · Answer

This answer should be enough to get you set up to follow this tutorial on Building a functional search component with MongoDB  Elasticsearch  and AngularJS   If you re looking to use faceted search with data from an API then Matthiasn s BirdWatch Repo is something you might want to look at   So here s how you can setup a single node Elasticsearch  cluster  to index MongoDB for use in a NodeJS  Express app on a fresh EC2 Ubuntu 14 04 instance   Make sure everything is up to date   sudo apt-get update   Install NodeJS   sudo apt-get install nodejs sudo apt-get install npm   Install MongoDB - These steps are straight from MongoDB docs  Choose whatever version you re comfortable with  I m sticking with v2 4 9 because it seems to be the most recent version MongoDB-River supports without issues   Import the MongoDB public GPG Key   sudo apt-key adv --keyserver hkp   keyserver ubuntu com 80 --recv 7F0CEB10   Update your sources list   echo  deb http   downloads-distro mongodb org repo ubuntu-upstart dist 10gen    sudo tee  etc apt sources list d mongodb list   Get the 10gen package   sudo apt-get install mongodb-10gen   Then pick your version if you don t want the most recent  If you are setting your environment up on a windows 7 or 8 machine stay away from v2 6 until they work some bugs out with running it as a service   apt-get install mongodb-10gen 2 4 9   Prevent the version of your MongoDB installation being bumped up when you update   echo  mongodb-10gen hold    sudo dpkg --set-selections   Start the MongoDB service   sudo service mongodb start   Your database files default to  var lib mongo and your log files to  var log mongo   Create a database through the mongo shell and push some dummy data into it   mongo YOUR DATABASE NAME db createCollection YOUR COLLECTION NAME  for  var i   1  i  lt   25  i    db YOUR COLLECTION NAME insert    x   i       Now to Convert the standalone MongoDB into a Replica Set   First Shutdown the process   mongo YOUR DATABASE NAME use admin db shutdownServer     Now we re running MongoDB as a service  so we don t pass in the  --replSet rs0  option in the command line argument when we restart the mongod process  Instead  we put it in the mongod conf file   vi  etc mongod conf   Add these lines  subbing for your db and log paths   replSet rs0 dbpath YOUR PATH TO DATA DB logpath YOUR PATH TO LOG MONGO LOG   Now open up the mongo shell again to initialize the replica set   mongo DATABASE NAME config       id     rs0    members          id    0   host     127 0 0 1 27017        rs initiate config  rs slaveOk      allows read operations to run on secondary members    Now install Elasticsearch  I m just following this helpful Gist   Make sure Java is installed   sudo apt-get install openjdk-7-jre-headless -y   Stick with v1 1 x for now until the Mongo-River plugin bug gets fixed in v1 2 1   wget https   download elasticsearch org elasticsearch elasticsearch elasticsearch-1 1 1 deb sudo dpkg -i elasticsearch-1 1 1 deb  curl -L http   github com elasticsearch elasticsearch-servicewrapper tarball master   tar -xz sudo mv  servicewrapper  service  usr local share elasticsearch bin  sudo rm -Rf  servicewrapper  sudo  usr local share elasticsearch bin service elasticsearch install sudo ln -s  readlink -f  usr local share elasticsearch bin service elasticsearch   usr local bin rcelasticsearch   Make sure  etc elasticsearch elasticsearch yml has the following config options enabled if you re only developing on a single node for now   cluster name   MY CLUSTER NAME  node local  true   Start the Elasticsearch service   sudo service elasticsearch start   Verify it s working   curl http   localhost 9200   If you see something like this then you re good        status    200     name     Chi Demon      version           number     1 1 2        build hash     e511f7b28b77c4d99175905fac65bffbf4c80cf7        build timestamp     2014-05-22T12 27 39Z        build snapshot    false       lucene version     4 7          tagline     You Know  for Search      Now install the Elasticsearch plugins so it can play with MongoDB   bin plugin --install com github richardwilly98 elasticsearch elasticsearch-river-mongodb 1 6 0 bin plugin --install elasticsearch elasticsearch-mapper-attachments 1 6 0   These two plugins aren t necessary but they re good for testing queries and visualizing changes to your indexes   bin plugin --install mobz elasticsearch-head bin plugin --install lukas-vlcek bigdesk   Restart Elasticsearch   sudo service elasticsearch restart   Finally index a collection from MongoDB   curl -XPUT localhost 9200  river DATABASE NAME  meta -d       type    mongodb      mongodb          servers              host    127 0 0 1    port   27017               db    DATABASE NAME        collection    ACTUAL COLLECTION NAME        options      secondary read preference   true         gridfs   false         index          name    ARBITRARY INDEX NAME        type    ARBITRARY TYPE NAME           Check that your index is in Elasticsearch  curl -XGET http   localhost 9200  aliases   Check your cluster health   curl -XGET  http   localhost 9200  cluster health pretty true    It s probably yellow with some unassigned shards  We have to tell Elasticsearch what we want to work with   curl -XPUT  localhost 9200  settings  -d     index       number of replicas    0        Check cluster health again  It should be green now   curl -XGET  http   localhost 9200  cluster health pretty true    Go play

User · Answer

I found mongo-connector useful  It is form Mongo Labs  MongoDB Inc   and can be used now with Elasticsearch 2 x  Elastic 2 x doc manager  https   github com mongodb-labs elastic2-doc-manager  mongo-connector creates a pipeline from a MongoDB cluster to one or more target systems  such as Solr  Elasticsearch  or another MongoDB cluster  It synchronizes data in MongoDB to the target then tails the MongoDB oplog  keeping up with operations in MongoDB in real-time  It has been tested with Python 2 6  2 7  and 3 3   Detailed documentation is available on the wiki   https   github com mongodb-labs mongo-connector https   github com mongodb-labs mongo-connector wiki Usage 20with 20ElasticSearch

[mongodb] How to use Elasticsearch with MongoDB?

Examples related to mongodb

Examples related to elasticsearch