[mongodb] mongodb count num of distinct values per field/key

Is there a query for calculating how many distinct values a field contains in DB.

f.e I have a field for country and there are 8 types of country values (spain, england, france, etc...)

If someone adds more documents with a new country I would like the query to return 9.

Is there easier way then group and count?

This question is related to mongodb mongodb-query aggregation-framework

The answer is


MongoDB has a distinct command which returns an array of distinct values for a field; you can check the length of the array for a count.

There is a shell db.collection.distinct() helper as well:

> db.countries.distinct('country');
[ "Spain", "England", "France", "Australia" ]

> db.countries.distinct('country').length
4

I wanted a more concise answer and I came up with the following using the documentation at aggregates and group

db.countries.aggregate([{"$group": {"_id": "$country", "count":{"$sum": 1}}])


To find distinct in field_1 in collection but we want some WHERE condition too than we can do like following :

db.your_collection_name.distinct('field_1', {WHERE condition here and it should return a document})

So, find number distinct names from a collection where age > 25 will be like :

db.your_collection_name.distinct('names', {'age': {"$gt": 25}})

Hope it helps!


I use this query:

var collection = "countries"; var field = "country"; 
db[collection].distinct(field).forEach(function(value){print(field + ", " + value + ": " + db.hosts.count({[field]: value}))})

Output:

countries, England: 3536
countries, France: 238
countries, Australia: 1044
countries, Spain: 16

This query first distinct all the values, and then count for each one of them the number of occurrences.


You can leverage on Mongo Shell Extensions. It's a single .js import that you can append to your $HOME/.mongorc.js, or programmatically, if you're coding in Node.js/io.js too.

Sample

For each distinct value of field counts the occurrences in documents optionally filtered by query

> db.users.distinctAndCount('name', {name: /^a/i})

{
  "Abagail": 1,
  "Abbey": 3,
  "Abbie": 1,
  ...
}

The field parameter could be an array of fields

> db.users.distinctAndCount(['name','job'], {name: /^a/i})

{
  "Austin,Educator" : 1,
  "Aurelia,Educator" : 1,
  "Augustine,Carpenter" : 1,
  ...
}

Here is example of using aggregation API. To complicate the case we're grouping by case-insensitive words from array property of the document.

db.articles.aggregate([
    {
        $match: {
            keywords: { $not: {$size: 0} }
        }
    },
    { $unwind: "$keywords" },
    {
        $group: {
            _id: {$toLower: '$keywords'},
            count: { $sum: 1 }
        }
    },
    {
        $match: {
            count: { $gte: 2 }
        }
    },
    { $sort : { count : -1} },
    { $limit : 100 }
]);

that give result such as

{ "_id" : "inflammation", "count" : 765 }
{ "_id" : "obesity", "count" : 641 }
{ "_id" : "epidemiology", "count" : 617 }
{ "_id" : "cancer", "count" : 604 }
{ "_id" : "breast cancer", "count" : 596 }
{ "_id" : "apoptosis", "count" : 570 }
{ "_id" : "children", "count" : 487 }
{ "_id" : "depression", "count" : 474 }
{ "_id" : "hiv", "count" : 468 }
{ "_id" : "prognosis", "count" : 428 }

With MongoDb 3.4.4 and newer, you can leverage the use of $arrayToObject operator and a $replaceRoot pipeline to get the counts.

For example, suppose you have a collection of users with different roles and you would like to calculate the distinct counts of the roles. You would need to run the following aggregate pipeline:

db.users.aggregate([
    { "$group": {
        "_id": { "$toLower": "$role" },
        "count": { "$sum": 1 }
    } },
    { "$group": {
        "_id": null,
        "counts": {
            "$push": { "k": "$_id", "v": "$count" }
        }
    } },
    { "$replaceRoot": {
        "newRoot": { "$arrayToObject": "$counts" }
    } }    
])

Example Output

{
    "user" : 67,
    "superuser" : 5,
    "admin" : 4,
    "moderator" : 12
}

Examples related to mongodb

Server Discovery And Monitoring engine is deprecated Avoid "current URL string parser is deprecated" warning by setting useNewUrlParser to true MongoNetworkError: failed to connect to server [localhost:27017] on first connect [MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27017] Failed to auto-configure a DataSource: 'spring.datasource.url' is not specified Failed to start mongod.service: Unit mongod.service not found db.collection is not a function when using MongoClient v3.0 MongoError: connect ECONNREFUSED 127.0.0.1:27017 MongoDB: How To Delete All Records Of A Collection in MongoDB Shell? How to resolve Nodejs: Error: ENOENT: no such file or directory How to create a DB for MongoDB container on start up?

Examples related to mongodb-query

How to join multiple collections with $lookup in mongodb $lookup on ObjectId's in an array how to convert string to numerical values in mongodb How to convert a pymongo.cursor.Cursor into a dict? Mongodb find() query : return only unique values (no duplicates) How to list all databases in the mongo shell? Printing Mongo query output to a file while in the mongo shell MongoDB "root" user How to query nested objects? How to filter array in subdocument with MongoDB

Examples related to aggregation-framework

$lookup on ObjectId's in an array Find duplicate records in MongoDB MongoDB SELECT COUNT GROUP BY mongodb group values by multiple fields How to filter array in subdocument with MongoDB mongodb count num of distinct values per field/key Converting string to date in mongodb MongoDB: Combine data from multiple collections into one..how? Retrieve only the queried element in an object array in MongoDB collection Update MongoDB field using value of another field