Random record from MongoDB

Question

I am looking to get a random record from a huge  100 million record  mongodb   What is the fastest and most efficient way to do so  The data is already there and there are no field in which I can generate a random number and obtain a random row   Any suggestions

User · Answer

In order to get a determinated number of random docs without duplicates:

first get all ids
get size of documents

loop geting random index and skip duplicated

number_of_docs=7
db.collection('preguntas').find({},{_id:1}).toArray(function(err, arr) {
count=arr.length
idsram=[]
rans=[]
while(number_of_docs!=0){
    var R = Math.floor(Math.random() * count);
    if (rans.indexOf(R) > -1) {
     continue
      } else {           
               ans.push(R)
               idsram.push(arr[R]._id)
               number_of_docs--
                }
    }
db.collection('preguntas').find({}).toArray(function(err1, doc1) {
                if (err1) { console.log(err1); return;  }
               res.send(doc1)
            });
        });

User · Answer

Now you can use the aggregate  Example   db users aggregate          sample    size  3           See the doc

User · Answer

I would suggest using map reduce  where you use the map function to only emit when a random value is above a given probability    function mapf         if Math random    lt   probability        emit 1  this            function reducef key values        return   documents   values      res   db questions mapReduce mapf  reducef    out     inline   1    scope      probability   0 5     printjson res results     The reducef function above works because only one key   1   is emitted from the map function   The value of the  probability  is defined in the  scope   when invoking mapRreduce       Using mapReduce like this should also be usable on a sharded db   If you want to select exactly n of m documents from the db  you could do it like this   function mapf         if countSubset    0  return      var prob   countSubset   countTotal      if Math random    lt   prob            emit 1    documents    this              countSubset--            countTotal--     function reducef key values        var newArray   new Array    for var i 0  i  lt  values length  i          newArray   newArray concat values i  documents      return   documents   newArray      res   db questions mapReduce mapf  reducef    out     inline   1    scope     countTotal   4   countSubset   2    printjson res results     Where  countTotal   m  is the number of documents in the db  and  countSubset   n  is the number of documents to retrieve   This approach might give some problems on sharded databases

User · Answer

it is tough if there is no data there to key off of   what are the  id field   are they mongodb object id s   If so  you could get the highest and lowest values   lowest   db coll find   sort   id 1   limit 1  next    id  highest   db coll find   sort   id -1   limit 1  next    id    then if you assume the id s are uniformly distributed  but they aren t  but at least it s a start    unsigned long long L   first 8 bytes of lowest  unsigned long long H   first 8 bytes of highest   V    H - L    random from 0 to 1    N   L   V  oid   N concat random 4 bytes     randomobj   db coll find   id   gte oid    limit 1

User · Answer

My PHP MongoDB sort order by RANDOM solution  Hope this helps anyone   Note  I have numeric ID s within my MongoDB collection that refer to a MySQL database record   First I create an array with 10 randomly generated numbers       randomNumbers           for  i   0   i  lt  10   i              randomNumbers     rand 0 1000           In my aggregation I use the  addField pipeline operator combined with  arrayElemAt and  mod  modulus   The modulus operator will give me a number from 0 - 9 which I then use to pick a number from the array with random generated numbers        aggregate                 addFields    gt                 random sort    gt      arrayElemAt    gt     randomNumbers      mod    gt      my numeric mysql id   10                              After that you can use the sort Pipeline        aggregate                 sort    gt                 random sort    gt  1

User · Answer

The following recipe is a little slower than the mongo cookbook solution  add a random key on every document   but returns more evenly distributed random documents   It s a little less-evenly distributed than the skip  random   solution  but much faster and more fail-safe in case documents are removed     function draw collection  query           query  mongodb query object  optional      var query   query             query  random        lte  Math random          var cur   collection find query  sort   rand  -1         if    cur hasNext              delete query random          cur   collection find query  sort   rand  -1               var doc   cur next        doc random   Math random        collection update    id  doc  id    doc       return doc      It also requires you to add a random  random  field to your documents so don t forget to add this when you create them   you may need to initialize your collection as shown by Geoffrey  function addRandom collection         collection find   forEach function  obj            obj random   Math random            collection save obj               db eval addRandom  db things     Benchmark results  This method is much faster than the skip   method  of ceejayoz  and generates more uniformly random documents than the  cookbook  method reported by Michael   For a collection with 1 000 000 elements    This method takes less than a millisecond on my machine  the skip   method takes 180 ms on average   The cookbook method will cause large numbers of documents to never get picked because their random number does not favor them     This method will pick all elements evenly over time   In my benchmark it was only 30  slower than the cookbook method  the randomness is not 100  perfect but it is very good  and it can be improved if necessary    This recipe is not perfect - the perfect solution would be a built-in feature as others have noted  However it should be a good compromise for many purposes

User · Answer

You can pick a random timestamp and search for the first object that was created afterwards  It will only scan a single document  though it doesn t necessarily give you a uniform distribution   var randRec   function            replace with your collection     var coll   db collection        get unixtime of first and last record     var min   coll find   sort   id  1   limit 1  0   id getTimestamp   - 0      var max   coll find   sort   id  -1   limit 1  0   id getTimestamp   - 0          allow to pass additional query params     return function query            if  typeof query      undefined   query              var randTime   Math round Math random      max - min     min          var hexSeconds   Math floor randTime   1000  toString 16           var id   ObjectId hexSeconds    0000000000000000            query  id     gte  id          return coll find query  limit 1

User · Answer

What works efficiently and reliably is this   Add a field called  random  to each document and assign a random value to it  add an index for the random field and proceed as follows   Let s assume we have a collection of web links called  links  and we want a random link from it   link   db links find   sort  random  1   limit 1  0    To ensure the same link won t pop up a second time  update its random field with a new random number   db links update  random  Math random     link

User · Answer

This works nice  it s fast  works with multiple documents and doesn t require populating rand field  which will eventually populate itself    add index to  rand field on your collection use find and refresh  something like       Install packages       npm install mongodb async    Add index in mongo       db ensureIndex  mycollection     rand  1     var mongodb   require  mongodb   var async   require  async       Find n random documents by using  rand  field  function findAndRefreshRand  collection  n  fields  done      var result        var rand   Math random         Append documents to the result based on criteria and options  if options limit is 0 skip the call    var appender   function  criteria  options  done        return function  done          if  options limit  gt  0            collection find criteria  fields  options  toArray            function  err  docs                if   err  amp  amp  Array isArray docs                   Array prototype push apply result  docs                            done err                                else           async nextTick done                       async series           Fetch docs with unitialized  rand         NOTE  You can comment out this step if all docs have initialized  rand   Math random       appender   rand     exists  false        limit  n - result length             Fetch on one side of random number      appender   rand     gte  rand        sort    rand  1    limit  n - result length             Continue fetch on the other side      appender   rand     lt  rand        sort    rand  -1    limit  n - result length             Refresh fetched docs  if any      function  done          if  result length  gt  0            var batch   collection initializeUnorderedBulkOp   w  0            for  var i   0  i  lt  result length    i              batch find    id  result i   id    updateOne   rand  Math random                        batch execute done          else           async nextTick done                      function  err        done err  result             Example usage mongodb MongoClient connect  mongodb   localhost 27017 core-development   function  err  db      if   err        findAndRefreshRand db collection  profiles    1024     id  true  rand  true    function  err  result          if   err            console log result          else           console error err                db close              else       console error err           ps  How to find random records in mongodb question is marked as duplicate of this question  The difference is that this question asks explicitly about single record as the other one explicitly about getting random documents

User · Answer

If you re using mongoid  the document-to-object wrapper  you can do the following in Ruby   Assuming your model is User   User all to a rand User count     In my  irbrc  I have  def rando klass     klass all to a rand klass count   end   so in rails console  I can do  for example   rando User rando Article   to get documents randomly from any collection

User · Answer

MongoDB now has  rand To pick n non repeat items  aggregate with    addFields     f     rand           then  sort by  f and  limit n

User · Answer

Starting with the 3 2 release of MongoDB  you can get N random docs from a collection using the  sample aggregation pipeline operator      Get one random document from the mycoll collection  db mycoll aggregate     sample    size  1         If you want to select the random document s  from a filtered subset of the collection  prepend a  match stage to the pipeline      Get one random document matching  a  10  from the mycoll collection  db mycoll aggregate          match    a  10             sample    size  1          As noted in the comments  when size is greater than 1  there may be duplicates in the returned document sample

User · Answer

non of the solutions worked well for me  especially when there are many gaps and set is small   this worked very well for me in php     count    collection- gt count  search    skip   mt rand 0   count - 1    result    collection- gt find  search - gt skip  skip - gt limit 1 - gt getNext

User · Answer

In Python using pymongo   import random  def get random doc        count   collection count       return collection find   random randrange count

User · Answer

I d suggest adding a random int field to each object  Then you can just do a   findOne  random field    gte  rand         to pick a random document  Just make sure you ensureIndex  random field 1

User · Answer

Actually opposite of the answers  sample might not be fastest solution  Because mongo may do a collection scan for random sorting when using  sample depending on the situation  Please see  Reference  https   docs mongodb com manual reference operator aggregation sample  Maybe doing counting result set and doing some random skip take will do better

User · Answer

Do a count of all records  generate a random number between 0 and the count  and then do   db yourCollection find   limit -1  skip yourRandomNumber  next

User · Answer

Using Python  pymongo   the aggregate function also works   collection aggregate     sample     size   sample size        This approach is a lot faster than running a query for a random number  e g  collection find  random int    This is especially the case for large collections

User · Answer

Using Map Reduce  you can certainly get a random record  just not necessarily very efficiently depending on the size of the resulting filtered collection you end up working with   I ve tested this method with 50 000 documents  the filter reduces it to about 30 000   and it executes in approximately 400ms on an Intel i3 with 16GB ram and a SATA3 HDD     db toc content mapReduce         map function        function     emit  1  this  id               reduce function        function k v            var r   Math floor  Math random   v length            return v r                  options                  out    inline  1               Filter the collection to  A ctive documents            query    status   A               The Map function simply creates an array of the id s of all documents that match the query  In my case I tested this with approximately 30 000 out of the 50 000 possible documents   The Reduce function simply picks a random integer between 0 and the number of items  -1  in the array  and then returns that  id from the array   400ms sounds like a long time  and it really is  if you had fifty million records instead of fifty thousand  this may increase the overhead to the point where it becomes unusable in multi-user situations   There is an open issue for MongoDB to include this feature in the core    https   jira mongodb org browse SERVER-533  If this  random  selection was built into an index-lookup instead of collecting ids into an array and then selecting one  this would help incredibly   go vote it up

User · Answer

Update for MongoDB 3 2  3 2 introduced  sample to the aggregation pipeline   There s also a good blog post on putting it into practice   For older versions  previous answer   This was actually a feature request   http   jira mongodb org browse SERVER-533 but it was filed under  Won t fix    The cookbook has a very good recipe to select a random document out of a collection   http   cookbook mongodb org patterns random-attribute   To paraphrase the recipe  you assign random numbers to your documents   db docs save    key   1       random   Math random         Then select a random document   rand   Math random   result   db docs findOne    key   2  random      gte   rand       if   result    null       result   db docs findOne    key   2  random      lte   rand           Querying with both  gte and  lte is necessary to find the document with a random number nearest rand   And of course you ll want to index on the random field   db docs ensureIndex    key   1  random  1       If you re already querying against an index  simply drop it  append random  1 to it  and add it again

User · Answer

You can also use MongoDB s geospatial indexing feature to select the documents  nearest  to a random number   First  enable geospatial indexing on a collection   db docs ensureIndex    random point   2d        To create a bunch of documents with random points on the X-axis   for   i   0  i  lt  10    i         db docs insert    key  i  random point   Math random    0           Then you can get a random document from the collection like this   db docs findOne    random point      near    Math random    0          Or you can retrieve several document nearest to a random point   db docs find    random point      near    Math random    0        limit  4     This requires only one query and no null checks  plus the code is clean  simple and flexible  You could even use the Y-axis of the geopoint to add a second randomness dimension to your query

User · Answer

If you have a simple id key  you could store all the id s in an array  and then pick a random id   Ruby answer    ids    coll find    fields   id 1   to a  coll find ids sample  first

User · Answer

The following aggregation operation randomly selects 3 documents from the collection  db users aggregate       sample    size  3         https   docs mongodb com manual reference operator aggregation sample

User · Answer

you can also use shuffle-array after executing your query  var shuffle   require  shuffle-array     Accounts find qry function err results array   newIndexArr shuffle results array

User · Answer

If you are using mongoose then you may use mongoose-random mongoose-random

User · Answer

When I was faced with a similar solution  I backtracked and found that the business request was actually for creating some form of rotation of the inventory being presented   In that case  there are much better options  which have answers from search engines like Solr  not data stores like MongoDB   In short  with the requirement to  intelligently rotate  content  what we should do instead of a random number across all of the documents is to include a personal q score modifier   To implement this yourself  assuming a small population of users  you can store a document per user that has the productId  impression count  click-through count  last seen date  and whatever other factors the business finds as being meaningful to compute a q score modifier   When retrieving the set to display  typically you request more documents from the data store than requested by the end user  then apply the q score modifier  take the number of records requested by the end user  then randomize the page of results  a tiny set  so simply sort the documents in the application layer  in memory    If the universe of users is too large  you can categorize users into behavior groups and index by behavior group rather than user   If the universe of products is small enough  you can create an index per user   I have found this technique to be much more efficient  but more importantly more effective in creating a relevant  worthwhile experience of using the software solution

User · Answer

My solution on php          Get random docs from Mongo     param  collection     param  where     param  fields     param  limit     author happy-code     url happy-code com     private function  mongodb get random  MongoCollection  collection   where   array     fields   array     limit   false            Total docs      count    collection- gt find  where   fields - gt count         if    limit               Get all docs          limit    count              data   array        for   i   0   i  lt   limit   i                   Skip documents          skip   rand 0    count-1             if   skip     0                 doc    collection- gt find  where   fields - gt skip  skip - gt limit 1 - gt getNext              else                doc    collection- gt find  where   fields - gt limit 1 - gt getNext                       if  is array  doc                    Catch document              data   doc   id  - gt    id        doc                 Ignore current document when making the next iteration              where   id     nin        doc   id                          Every iteration catch document and decrease in the total number of document          count--              return  data

User · Answer

Here is a way using the default ObjectId values for  id and a little math and logic      Get the  min  and  max  timestamp values from the  id in the collection and the     diff between     4-bytes from a hex string is 8 characters  var min   parseInt db collection find            sort     id   1    limit 1  toArray   0   id str substr 0 8  16  1000      max   parseInt db collection find            sort     id   -1   limit 1  toArray   0   id str substr 0 8  16  1000      diff   max - min      Get a random value from diff and divide multiply be 1000 for The   id  precision  var random   Math floor Math floor Math random diff  diff  1000  1000      Use  random  in the range and pad the hex string to a valid ObjectId var  id   new ObjectId   min   random  1000  toString 16     0000000000000000       Then query for the single document  var randomDoc   db collection find     id       gte    id          sort     id   1    limit 1  toArray   0     That s the general logic in shell representation and easily adaptable   So in points    Find the min and max primary key values in the collection Generate a random number that falls between the timestamps of those documents  Add the random number to the minimum value and find the first document that is greater than or equal to that value    This uses  padding  from the timestamp value in  hex  to form a valid ObjectId value since that is what we are looking for  Using integers as the  id value is essentially simplier but the same basic idea in the points

User · Answer

You can pick random  id and return corresponding object    db collection count  function err  count           db collection distinct    id    function  err  result                if  err                  res send err              var randomId   result Math floor Math random      count-1                db collection findOne     id  randomId     function  err  result                    if  err                      res send err                  console log result                                     Here you dont need to spend space on storing random numbers in collection

[mongodb] Random record from MongoDB

Examples related to mongodb

Examples related to random

Examples related to mongodb-query