[c] Implementing a HashMap in C

How to go about creating a Hashmap in C from scratch as is present in C++ STL?

What parameters would be taken into consideration and how would you test the hashmap? As in, what would the benchmark test cases be which you would run before you could say that your hashmap is complete?

This question is related to c data-structures hashmap

The answer is


The best approach depends on the expected key distribution and number of collisions. If relatively few collisions are expected, it really doesn't matter which method is used. If lots of collisions are expected, then which to use depends on the cost of rehashing or probing vs. manipulating the extensible bucket data structure.

But here is source code example of An Hashmap Implementation in C


There are other mechanisms to handle overflow than the simple minded linked list of overflow entries which e.g. wastes a lot of memory.

Which mechanism to use depends among other things on if you can choose the hash function and possible pick more than one (to implement e.g. double hashing to handle collisions); if you expect to often add items or if the map is static once filled; if you intend to remove items or not; ...

The best way to implement this is to first think about all these parameters and then not code it yourself but to pick a mature existing implementation. Google has a few good implementations -- e.g. http://code.google.com/p/google-sparsehash/


The primary goal of a hashmap is to store a data set and provide near constant time lookups on it using a unique key. There are two common styles of hashmap implementation:

  • Separate chaining: one with an array of buckets (linked lists)
  • Open addressing: a single array allocated with extra space so index collisions may be resolved by placing the entry in an adjacent slot.

Separate chaining is preferable if the hashmap may have a poor hash function, it is not desirable to pre-allocate storage for potentially unused slots, or entries may have variable size. This type of hashmap may continue to function relatively efficiently even when the load factor exceeds 1.0. Obviously, there is extra memory required in each entry to store linked list pointers.

Hashmaps using open addressing have potential performance advantages when the load factor is kept below a certain threshold (generally about 0.7) and a reasonably good hash function is used. This is because they avoid potential cache misses and many small memory allocations associated with a linked list, and perform all operations in a contiguous, pre-allocated array. Iteration through all elements is also cheaper. The catch is hashmaps using open addressing must be reallocated to a larger size and rehashed to maintain an ideal load factor, or they face a significant performance penalty. It is impossible for their load factor to exceed 1.0.

Some key performance metrics to evaluate when creating a hashmap would include:

  • Maximum load factor
  • Average collision count on insertion
  • Distribution of collisions: uneven distribution (clustering) could indicate a poor hash function.
  • Relative time for various operations: put, get, remove of existing and non-existing entries.

Here is a flexible hashmap implementation I made. I used open addressing and linear probing for collision resolution.

https://github.com/DavidLeeds/hashmap


Examples related to c

conflicting types for 'outchar' Can't compile C program on a Mac after upgrade to Mojave Program to find largest and second largest number in array Prime numbers between 1 to 100 in C Programming Language In c, in bool, true == 1 and false == 0? How I can print to stderr in C? Visual Studio Code includePath "error: assignment to expression with array type error" when I assign a struct field (C) Compiling an application for use in highly radioactive environments How can you print multiple variables inside a string using printf?

Examples related to data-structures

Program to find largest and second largest number in array golang why don't we have a set datastructure How to initialize a vector with fixed length in R C compiling - "undefined reference to"? List of all unique characters in a string? Binary Search Tree - Java Implementation How to clone object in C++ ? Or Is there another solution? How to check queue length in Python Difference between "Complete binary tree", "strict binary tree","full binary Tree"? Write code to convert given number into words (eg 1234 as input should output one thousand two hundred and thirty four)

Examples related to hashmap

How to split a string in two and store it in a field Printing a java map Map<String, Object> - How? Hashmap with Streams in Java 8 Streams to collect value of Map How to convert String into Hashmap in java Convert object array to hash map, indexed by an attribute value of the Object HashMap - getting First Key value The type java.util.Map$Entry cannot be resolved. It is indirectly referenced from required .class files Sort Go map values by keys Print all key/value pairs in a Java ConcurrentHashMap creating Hashmap from a JSON String