An alternative that has not been mentioned, but that may be a more efficient option for some (won't work with NULL entries) is to use a sparse index (entries in the index only exist when there is something in the field). Here is a sample data set:
db.foo.find()
{ "_id" : ObjectId("544540b31b5cf91c4893eb94"), "imageUrl" : "http://example.com/foo.jpg" }
{ "_id" : ObjectId("544540ba1b5cf91c4893eb95"), "imageUrl" : "http://example.com/bar.jpg" }
{ "_id" : ObjectId("544540c51b5cf91c4893eb96"), "imageUrl" : "http://example.com/foo.png" }
{ "_id" : ObjectId("544540c91b5cf91c4893eb97"), "imageUrl" : "http://example.com/bar.png" }
{ "_id" : ObjectId("544540ed1b5cf91c4893eb98"), "otherField" : 1 }
{ "_id" : ObjectId("544540f11b5cf91c4893eb99"), "otherField" : 2 }
Now, create the sparse index on imageUrl field:
db.foo.ensureIndex( { "imageUrl": 1 }, { sparse: true } )
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
Now, there is always a chance (and in particular with a small data set like my sample) that rather than using an index, MongoDB will use a table scan, even for a potential covered index query. As it turns out that gives me an easy way to illustrate the difference here:
db.foo.find({}, {_id : 0, imageUrl : 1})
{ "imageUrl" : "http://example.com/foo.jpg" }
{ "imageUrl" : "http://example.com/bar.jpg" }
{ "imageUrl" : "http://example.com/foo.png" }
{ "imageUrl" : "http://example.com/bar.png" }
{ }
{ }
OK, so the extra documents with no imageUrl
are being returned, just empty, not what we wanted. Just to confirm why, do an explain:
db.foo.find({}, {_id : 0, imageUrl : 1}).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 6,
"nscannedObjects" : 6,
"nscanned" : 6,
"nscannedObjectsAllPlans" : 6,
"nscannedAllPlans" : 6,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"server" : "localhost:31100",
"filterSet" : false
}
So, yes, a BasicCursor
equals a table scan, it did not use the index. Let's force the query to use our sparse index with a hint()
:
db.foo.find({}, {_id : 0, imageUrl : 1}).hint({imageUrl : 1})
{ "imageUrl" : "http://example.com/bar.jpg" }
{ "imageUrl" : "http://example.com/bar.png" }
{ "imageUrl" : "http://example.com/foo.jpg" }
{ "imageUrl" : "http://example.com/foo.png" }
And there is the result we were looking for - only documents with the field populated are returned. This also only uses the index (i.e. it is a covered index query), so only the index needs to be in memory to return the results.
This is a specialized use case and can't be used generally (see other answers for those options). In particular it should be noted that as things stand you cannot use count()
in this way (for my example it will return 6 not 4), so please only use when appropriate.