[mongodb] How to use Elasticsearch with MongoDB?

I have gone through many blogs and sites about configuring Elasticsearch for MongoDB to index Collections in MongoDB but none of them were straightforward.

Please explain to me a step by step process for installing elasticsearch, which should include:

  • configuration
  • run in the browser

I am using Node.js with express.js, so please help accordingly.

This question is related to mongodb elasticsearch

The answer is


Here how to do this on mongodb 3.0. I used this nice blog

  1. Install mongodb.
  2. Create data directories:
$ mkdir RANDOM_PATH/node1
$ mkdir RANDOM_PATH/node2> 
$ mkdir RANDOM_PATH/node3
  1. Start Mongod instances
$ mongod --replSet test --port 27021 --dbpath node1
$ mongod --replSet test --port 27022 --dbpath node2
$ mongod --replSet test --port 27023 --dbpath node3
  1. Configure the Replica Set:
$ mongo
config = {_id: 'test', members: [ {_id: 0, host: 'localhost:27021'}, {_id: 1, host: 'localhost:27022'}]};    
rs.initiate(config);
  1. Installing Elasticsearch:
a. Download and unzip the [latest Elasticsearch][2] distribution

b. Run bin/elasticsearch to start the es server.

c. Run curl -XGET http://localhost:9200/ to confirm it is working.
  1. Installing and configuring the MongoDB River:

$ bin/plugin --install com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb

$ bin/plugin --install elasticsearch/elasticsearch-mapper-attachments

  1. Create the “River” and the Index:

curl -XPUT 'http://localhost:8080/_river/mongodb/_meta' -d '{ "type": "mongodb", "mongodb": { "db": "mydb", "collection": "foo" }, "index": { "name": "name", "type": "random" } }'

  1. Test on browser:

    http://localhost:9200/_search?q=home


Since mongo-connector now appears dead, my company decided to build a tool for using Mongo change streams to output to Elasticsearch.

Our initial results look promising. You can check it out at https://github.com/electionsexperts/mongo-stream. We're still early in development, and would welcome suggestions or contributions.


River is a good solution once you want to have a almost real time synchronization and general solution.

If you have data in MongoDB already and want to ship it very easily to Elasticsearch like "one-shot" you can try my package in Node.js https://github.com/itemsapi/elasticbulk.

It's using Node.js streams so you can import data from everything what is supporting streams (i.e. MongoDB, PostgreSQL, MySQL, JSON files, etc)

Example for MongoDB to Elasticsearch:

Install packages:

npm install elasticbulk
npm install mongoose
npm install bluebird

Create script i.e. script.js:

const elasticbulk = require('elasticbulk');
const mongoose = require('mongoose');
const Promise = require('bluebird');
mongoose.connect('mongodb://localhost/your_database_name', {
  useMongoClient: true
});

mongoose.Promise = Promise;

var Page = mongoose.model('Page', new mongoose.Schema({
  title: String,
  categories: Array
}), 'your_collection_name');

// stream query 
var stream = Page.find({
}, {title: 1, _id: 0, categories: 1}).limit(1500000).skip(0).batchSize(500).stream();

elasticbulk.import(stream, {
  index: 'my_index_name',
  type: 'my_type_name',
  host: 'localhost:9200',
})
.then(function(res) {
  console.log('Importing finished');
})

Ship your data:

node script.js

It's not extremely fast but it's working for millions of records (thanks to streams).


Here I found another good option to migrate your MongoDB data to Elasticsearch. A go daemon that syncs mongodb to elasticsearch in realtime. Its the Monstache. Its available at : Monstache

Below the initial setp to configure and use it.

Step 1:

C:\Program Files\MongoDB\Server\4.0\bin>mongod --smallfiles --oplogSize 50 --replSet test

Step 2 :

C:\Program Files\MongoDB\Server\4.0\bin>mongo

C:\Program Files\MongoDB\Server\4.0\bin>mongo
MongoDB shell version v4.0.2
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 4.0.2
Server has startup warnings:
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten]
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten]
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] ** WARNING: This server is bound to localhost.
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] **          Remote systems will be unable to connect to this server.
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] **          Start the server with --bind_ip <address> to specify which IP
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] **          addresses it should serve responses from, or with --bind_ip_all to
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] **          bind to all interfaces. If this behavior is desired, start the
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] **          server with --bind_ip 127.0.0.1 to disable this warning.
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten]
MongoDB Enterprise test:PRIMARY>

Step 3 : Verify the replication.

MongoDB Enterprise test:PRIMARY> rs.status();
{
        "set" : "test",
        "date" : ISODate("2019-01-18T11:39:00.380Z"),
        "myState" : 1,
        "term" : NumberLong(2),
        "syncingTo" : "",
        "syncSourceHost" : "",
        "syncSourceId" : -1,
        "heartbeatIntervalMillis" : NumberLong(2000),
        "optimes" : {
                "lastCommittedOpTime" : {
                        "ts" : Timestamp(1547811537, 1),
                        "t" : NumberLong(2)
                },
                "readConcernMajorityOpTime" : {
                        "ts" : Timestamp(1547811537, 1),
                        "t" : NumberLong(2)
                },
                "appliedOpTime" : {
                        "ts" : Timestamp(1547811537, 1),
                        "t" : NumberLong(2)
                },
                "durableOpTime" : {
                        "ts" : Timestamp(1547811537, 1),
                        "t" : NumberLong(2)
                }
        },
        "lastStableCheckpointTimestamp" : Timestamp(1547811517, 1),
        "members" : [
                {
                        "_id" : 0,
                        "name" : "localhost:27017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 736,
                        "optime" : {
                                "ts" : Timestamp(1547811537, 1),
                                "t" : NumberLong(2)
                        },
                        "optimeDate" : ISODate("2019-01-18T11:38:57Z"),
                        "syncingTo" : "",
                        "syncSourceHost" : "",
                        "syncSourceId" : -1,
                        "infoMessage" : "",
                        "electionTime" : Timestamp(1547810805, 1),
                        "electionDate" : ISODate("2019-01-18T11:26:45Z"),
                        "configVersion" : 1,
                        "self" : true,
                        "lastHeartbeatMessage" : ""
                }
        ],
        "ok" : 1,
        "operationTime" : Timestamp(1547811537, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1547811537, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
MongoDB Enterprise test:PRIMARY>

Step 4. Download the "https://github.com/rwynn/monstache/releases". Unzip the download and adjust your PATH variable to include the path to the folder for your platform. GO to cmd and type "monstache -v" # 4.13.1 Monstache uses the TOML format for its configuration. Configure the file for migration named config.toml

Step 5.

My config.toml -->

mongo-url = "mongodb://127.0.0.1:27017/?replicaSet=test"
elasticsearch-urls = ["http://localhost:9200"]

direct-read-namespaces = [ "admin.users" ]

gzip = true
stats = true
index-stats = true

elasticsearch-max-conns = 4
elasticsearch-max-seconds = 5
elasticsearch-max-bytes = 8000000 

dropped-collections = false
dropped-databases = false

resume = true
resume-write-unsafe = true
resume-name = "default"
index-files = false
file-highlighting = false
verbose = true
exit-after-direct-reads = false

index-as-update=true
index-oplog-time=true

Step 6.

D:\15-1-19>monstache -f config.toml

Monstache Running...

Confirm Migrated Data at Elasticsearch

Add Record at Mongo

Monstache Captured the event and migrate the data to elasticsearch


I found mongo-connector useful. It is form Mongo Labs (MongoDB Inc.) and can be used now with Elasticsearch 2.x

Elastic 2.x doc manager: https://github.com/mongodb-labs/elastic2-doc-manager

mongo-connector creates a pipeline from a MongoDB cluster to one or more target systems, such as Solr, Elasticsearch, or another MongoDB cluster. It synchronizes data in MongoDB to the target then tails the MongoDB oplog, keeping up with operations in MongoDB in real-time. It has been tested with Python 2.6, 2.7, and 3.3+. Detailed documentation is available on the wiki.

https://github.com/mongodb-labs/mongo-connector https://github.com/mongodb-labs/mongo-connector/wiki/Usage%20with%20ElasticSearch


Using river can present issues when your operation scales up. River will use a ton of memory when under heavy operation. I recommend implementing your own elasticsearch models, or if you're using mongoose you can build your elasticsearch models right into that or use mongoosastic which essentially does this for you.

Another disadvantage to Mongodb River is that you'll be stuck using mongodb 2.4.x branch, and ElasticSearch 0.90.x. You'll start to find that you're missing out on a lot of really nice features, and the mongodb river project just doesn't produce a usable product fast enough to keep stable. That said Mongodb River is definitely not something I'd go into production with. It's posed more problems than its worth. It will randomly drop write under heavy load, it will consume lots of memory, and there's no setting to cap that. Additionally, river doesn't update in realtime, it reads oplogs from mongodb, and this can delay updates for as long as 5 minutes in my experience.

We recently had to rewrite a large portion of our project, because its a weekly occurrence that something goes wrong with ElasticSearch. We had even gone as far as to hire a Dev Ops consultant, who also agrees that its best to move away from River.

UPDATE: Elasticsearch-mongodb-river now supports ES v1.4.0 and mongodb v2.6.x. However, you'll still likely run into performance problems on heavy insert/update operations as this plugin will try to read mongodb's oplogs to sync. If there are a lot of operations since the lock(or latch rather) unlocks, you'll notice extremely high memory usage on your elasticsearch server. If you plan on having a large operation, river is not a good option. The developers of ElasticSearch still recommend you to manage your own indexes by communicating directly with their API using the client library for your language, rather than using river. This isn't really the purpose of river. Twitter-river is a great example of how river should be used. Its essentially a great way to source data from outside sources, but not very reliable for high traffic or internal use.

Also consider that mongodb-river falls behind in version, as its not maintained by ElasticSearch Organization, its maintained by a thirdparty. Development was stuck on v0.90 branch for a long time after the release of v1.0, and when a version for v1.0 was released it wasn't stable until elasticsearch released v1.3.0. Mongodb versions also fall behind. You may find yourself in a tight spot when you're looking to move to a later version of each, especially with ElasticSearch under such heavy development, with many very anticipated features on the way. Staying up on the latest ElasticSearch has been very important as we rely heavily on constantly improving our search functionality as its a core part of our product.

All in all you'll likely get a better product if you do it yourself. Its not that difficult. Its just another database to manage in your code, and it can easily be dropped in to your existing models without major refactoring.