[c++] Is there any advantage of using map over unordered_map in case of trivial keys?

A recent talk about unordered_map in C++ made me realize that I should use unordered_map for most cases where I used map before, because of the efficiency of lookup ( amortized O(1) vs. O(log n) ). Most times I use a map, I use either int or std::string as the key type; hence, I've got no problems with the definition of the hash function. The more I thought about it, the more I came to realize that I can't find any reason of using a std::map over a std::unordered_map in the case of keys with simple types -- I took a look at the interfaces, and didn't find any significant differences that would impact my code.

Hence the question: is there any real reason to use std::map over std::unordered_map in the case of simple types like int and std::string?

I'm asking from a strictly programming point of view -- I know that it's not fully considered standard, and that it may pose problems with porting.

Also, I expect that one of the correct answers might be "it's more efficient for smaller sets of data" because of a smaller overhead (is that true?) -- hence I'd like to restrict the question to cases where the amount of keys is non-trivial (>1 024).

Edit: duh, I forgot the obvious (thanks GMan!) -- yes, maps are ordered of course -- I know that, and am looking for other reasons.

This question is related to c++ performance dictionary unordered-map

The answer is


Small addition to all of above:

Better use map, when you need to get elements by range, as they are sorted and you can just iterate over them from one boundary to another.


I've made a test recently which makes 50000 merge&sort. That means if the string keys are the same, merge the byte string. And the final output should be sorted. So this includes a look up for every insertion.

For the map implementation, it takes 200 ms to finish the job. For the unordered_map + map, it takes 70 ms for unordered_map insertion and 80 ms for map insertion. So the hybrid implementation is 50 ms faster.

We should think twice before we use the map. If you only need the data to be sorted in the final result of your program, a hybrid solution may be better.


If you want to compare the speed of your std::map and std::unordered_map implementations, you could use Google's sparsehash project which has a time_hash_map program to time them. For example, with gcc 4.4.2 on an x86_64 Linux system

$ ./time_hash_map
TR1 UNORDERED_MAP (4 byte objects, 10000000 iterations):
map_grow              126.1 ns  (27427396 hashes, 40000000 copies)  290.9 MB
map_predict/grow       67.4 ns  (10000000 hashes, 40000000 copies)  232.8 MB
map_replace            22.3 ns  (37427396 hashes, 40000000 copies)
map_fetch              16.3 ns  (37427396 hashes, 40000000 copies)
map_fetch_empty         9.8 ns  (10000000 hashes,        0 copies)
map_remove             49.1 ns  (37427396 hashes, 40000000 copies)
map_toggle             86.1 ns  (20000000 hashes, 40000000 copies)

STANDARD MAP (4 byte objects, 10000000 iterations):
map_grow              225.3 ns  (       0 hashes, 20000000 copies)  462.4 MB
map_predict/grow      225.1 ns  (       0 hashes, 20000000 copies)  462.6 MB
map_replace           151.2 ns  (       0 hashes, 20000000 copies)
map_fetch             156.0 ns  (       0 hashes, 20000000 copies)
map_fetch_empty         1.4 ns  (       0 hashes,        0 copies)
map_remove            141.0 ns  (       0 hashes, 20000000 copies)
map_toggle             67.3 ns  (       0 hashes, 20000000 copies)

I was intrigued by the answer from @Jerry Coffin, which suggested that the ordered map would exhibit performance increases on long strings, after some experimentation (which can be downloaded from pastebin), I've found that this seems to hold true only for collections of random strings, when the map is initialised with a sorted dictionary (which contain words with considerable amounts of prefix-overlap), this rule breaks down, presumably because of the increased tree depth necessary to retrieve value. The results are shown below, the 1st number column is insert time, 2nd is fetch time.

g++ -g -O3 --std=c++0x   -c -o stdtests.o stdtests.cpp
g++ -o stdtests stdtests.o
gmurphy@interloper:HashTests$ ./stdtests
# 1st number column is insert time, 2nd is fetch time
 ** Integer Keys ** 
 unordered:      137      15
   ordered:      168      81
 ** Random String Keys ** 
 unordered:       55      50
   ordered:       33      31
 ** Real Words Keys ** 
 unordered:      278      76
   ordered:      516     298

From: http://www.cplusplus.com/reference/map/map/

"Internally, the elements in a map are always sorted by its key following a specific strict weak ordering criterion indicated by its internal comparison object (of type Compare).

map containers are generally slower than unordered_map containers to access individual elements by their key, but they allow the direct iteration on subsets based on their order."


I would just point out that... there are many kind of unordered_maps.

Look up the Wikipedia Article on hash map. Depending on which implementation was used, the characteristics in term of look-up, insertion and deletion might vary quite significantly.

And that's what worries me the most with the addition of unordered_map to the STL: they will have to choose a particular implementation as I doubt they'll go down the Policy road, and so we will be stuck with an implementation for the average use and nothing for the other cases...

For example some hash maps have linear rehashing, where instead of rehashing the whole hash map at once, a portion is rehash at each insertion, which helps amortizing the cost.

Another example: some hash maps use a simple list of nodes for a bucket, others use a map, others don't use nodes but find the nearest slot and lastly some will use a list of nodes but reorder it so that the last accessed element is at the front (like a caching thing).

So at the moment I tend to prefer the std::map or perhaps a loki::AssocVector (for frozen data sets).

Don't get me wrong, I'd like to use the std::unordered_map and I may in the future, but it's difficult to "trust" the portability of such a container when you think of all the ways of implementing it and the various performances that result of this.


Reasons have been given in other answers; here is another.

std::map (balanced binary tree) operations are amortized O(log n) and worst case O(log n). std::unordered_map (hash table) operations are amortized O(1) and worst case O(n).

How this plays out in practice is that the hash table "hiccups" every once in a while with an O(n) operation, which may or may not be something your application can tolerate. If it can't tolerate it, you'd prefer std::map over std::unordered_map.


Hash tables have higher constants than common map implementations, which become significant for small containers. Max size is 10, 100, or maybe even 1,000 or more? Constants are the same as ever, but O(log n) is close to O(k). (Remember logarithmic complexity is still really good.)

What makes a good hash function depends on your data's characteristics; so if I don't plan on looking at a custom hash function (but can certainly change my mind later, and easily since I typedef damn near everything) and even though defaults are chosen to perform decently for many data sources, I find the ordered nature of map to be enough of a help initially that I still default to map rather than a hash table in that case.

Plus that way you don't have to even think about writing a hash function for other (usually UDT) types, and just write op< (which you want anyway).


Summary

Assuming ordering is not important:

  • If you are going to build large table once and do lots of queries, use std::unordered_map
  • If you are going to build small table (may be under 100 elements) and do lots of queries, use std::map. This is because reads on it are O(log n).
  • If you are going to change table a lot then may be std::map is good option.
  • If you are in doubt, just use std::unordered_map.

Historical Context

In most languages, unordered map (aka hash based dictionaries) are the default map however in C++ you get ordered map as default map. How did that happen? Some people erroneously assume that C++ committee made this decision in their unique wisdom but the truth is unfortunately uglier than that.

It is widely believed that C++ ended up with ordered map as default because there are not too many parameters on how they can be implemented. On the other hand, hash based implementations has tons of things to talk about. So to avoid gridlocks in standardization they just got along with ordered map. Around 2005, many languages already had good implementations of hash based implementation and so it was more easier for the committee to accept new std::unordered_map. In a perfect world, std::map would have been unordered and we would have std::ordered_map as separate type.

Performance

Below two graphs should speak for themselves (source):

enter image description here

enter image description here


I'd echo roughly the same point GMan made: depending on the type of use, std::map can be (and often is) faster than std::tr1::unordered_map (using the implementation included in VS 2008 SP1).

There are a few complicating factors to keep in mind. For example, in std::map, you're comparing keys, which means you only ever look at enough of the beginning of a key to distinguish between the right and left sub-branches of the tree. In my experience, nearly the only time you look at an entire key is if you're using something like int that you can compare in a single instruction. With a more typical key type like std::string, you often compare only a few characters or so.

A decent hash function, by contrast, always looks at the entire key. IOW, even if the table lookup is constant complexity, the hash itself has roughly linear complexity (though on the length of the key, not the number of items). With long strings as keys, an std::map might finish a search before an unordered_map would even start its search.

Second, while there are several methods of resizing hash tables, most of them are pretty slow -- to the point that unless lookups are considerably more frequent than insertions and deletions, std::map will often be faster than std::unordered_map.

Of course, as I mentioned in the comment on your previous question, you can also use a table of trees. This has both advantages and disadvantages. On one hand, it limits the worst case to that of a tree. It also allows fast insertion and deletion, because (at least when I've done it) I've used a fixed-size of table. Eliminating all table resizing allows you to keep your hash table a lot simpler and typically faster.

One other point: the requirements for hashing and tree-based maps are different. Hashing obviously requires a hash function, and an equality comparison, where ordered maps require a less-than comparison. Of course the hybrid I mentioned requires both. Of course, for the common case of using a string as the key, this isn't really a problem, but some types of keys suit ordering better than hashing (or vice versa).


Significant differences that have not really been adequately mentioned here:

  • map keeps iterators to all elements stable, in C++17 you can even move elements from one map to the other without invalidating iterators to them (and if properly implemented without any potential allocation).
  • map timings for single operations are typically more consistent since they never need large allocations.
  • unordered_map using std::hash as implemented in the libstdc++ is vulnerable to DoS if fed with untrusted input (it uses MurmurHash2 with a constant seed - not that seeding would really help, see https://emboss.github.io/blog/2012/12/14/breaking-murmur-hash-flooding-dos-reloaded/).
  • Being ordered enables efficient range searches, e.g. iterate over all elements with key = 42.

Examples related to c++

Method Call Chaining; returning a pointer vs a reference? How can I tell if an algorithm is efficient? Difference between opening a file in binary vs text How can compare-and-swap be used for a wait-free mutual exclusion for any shared data structure? Install Qt on Ubuntu #include errors detected in vscode Cannot open include file: 'stdio.h' - Visual Studio Community 2017 - C++ Error How to fix the error "Windows SDK version 8.1" was not found? Visual Studio 2017 errors on standard headers How do I check if a Key is pressed on C++

Examples related to performance

Why is 2 * (i * i) faster than 2 * i * i in Java? What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check if a key exists in Json Object and get its value Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? Most efficient way to map function over numpy array The most efficient way to remove first N elements in a list? Fastest way to get the first n elements of a List into an Array Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3? pandas loc vs. iloc vs. at vs. iat? Android Recyclerview vs ListView with Viewholder

Examples related to dictionary

JS map return object python JSON object must be str, bytes or bytearray, not 'dict Python update a key in dict if it doesn't exist How to update the value of a key in a dictionary in Python? How to map an array of objects in React C# Dictionary get item by index Are dictionaries ordered in Python 3.6+? Split / Explode a column of dictionaries into separate columns with pandas Writing a dictionary to a text file? enumerate() for dictionary in python

Examples related to unordered-map

C++ unordered_map using a custom class type as the key Is there any advantage of using map over unordered_map in case of trivial keys?