River is a good solution once you want to have a almost real time synchronization and general solution.
If you have data in MongoDB already and want to ship it very easily to Elasticsearch like "one-shot" you can try my package in Node.js https://github.com/itemsapi/elasticbulk.
It's using Node.js streams so you can import data from everything what is supporting streams (i.e. MongoDB, PostgreSQL, MySQL, JSON files, etc)
Example for MongoDB to Elasticsearch:
Install packages:
npm install elasticbulk
npm install mongoose
npm install bluebird
Create script i.e. script.js:
const elasticbulk = require('elasticbulk');
const mongoose = require('mongoose');
const Promise = require('bluebird');
mongoose.connect('mongodb://localhost/your_database_name', {
useMongoClient: true
});
mongoose.Promise = Promise;
var Page = mongoose.model('Page', new mongoose.Schema({
title: String,
categories: Array
}), 'your_collection_name');
// stream query
var stream = Page.find({
}, {title: 1, _id: 0, categories: 1}).limit(1500000).skip(0).batchSize(500).stream();
elasticbulk.import(stream, {
index: 'my_index_name',
type: 'my_type_name',
host: 'localhost:9200',
})
.then(function(res) {
console.log('Importing finished');
})
Ship your data:
node script.js
It's not extremely fast but it's working for millions of records (thanks to streams).