Is a Java hashmap search really O 1

Question

I ve seen some interesting claims on SO re Java hashmaps and their O 1  lookup time  Can someone explain why this is so  Unless these hashmaps are vastly different from any of the hashing algorithms I was bought up on  there must always exist a dataset that contains collisions   In which case  the lookup would be O n  rather than O 1     Can someone explain whether they are O 1  and  if so  how they achieve this

User · Answer

It is O 1  only if your hashing function is very good  The Java hash table implementation does not protect against bad hash functions   Whether you need to grow the table when you add items or not is not relevant to the question because it is about lookup time

User · Answer

Academics aside  from a practical perspective  HashMaps should be accepted as having an inconsequential performance impact  unless your profiler tells you otherwise

User · Answer

In Java  HashMap works by using hashCode to locate a bucket   Each bucket is a list of items residing in that bucket   The items are scanned  using equals for comparison   When adding items  the HashMap is resized once a certain load percentage is reached   So  sometimes it will have to compare against a few items  but generally it s much closer to O 1  than O n    For practical purposes  that s all you should need to know

User · Answer

This basically goes for most hash table implementations in most programming languages  as the algorithm itself doesn t really change    If there are no collisions present in the table  you only have to do a single look-up  therefore the running time is O 1   If there are collisions present  you have to do more than one look-up  which drives down the performance towards O n

User · Answer

It depends on the algorithm you choose to avoid collisions  If your implementation uses separate chaining then the worst case scenario happens where every data element is hashed to the same value  poor choice of the hash function for example   In that case  data lookup is no different from a linear search on a linked list i e  O n   However  the probability of that happening is negligible and lookups best and average cases remain constant i e  O 1

User · Answer

Of course the performance of the hashmap will depend based on the quality of the hashCode   function for the given object  However  if the function is implemented such that the possibility of collisions is very low  it will have a very good performance  this is not strictly O 1  in every possible case but it is in most cases    For example the default implementation in the Oracle JRE is to use a random number  which is stored in the object instance so that it doesn t change - but it also disables biased locking  but that s an other discussion  so the chance of collisions is very low

User · Answer

O 1 n k  where k is the number of buckets    If implementation sets k   n alpha then it is O 1 alpha    O 1  since alpha is a constant

User · Answer

You seem to mix up worst-case behaviour with average-case  expected  runtime  The former is indeed O n  for hash tables in general  i e  not using a perfect hashing  but this is rarely relevant in practice   Any dependable hash table implementation  coupled with a half decent hash  has a retrieval performance of O 1  with a very small factor  2  in fact  in the expected case  within a very narrow margin of variance

User · Answer

Elements inside the HashMap are stored as an array of linked list  node   each linked list in the array represents a bucket for unique hash value of one or more keys  While adding an entry in the HashMap  the hashcode of the key is used to determine the location of the bucket in the array  something like   location    arraylength - 1   amp  keyhashcode   Here the  amp  represents bitwise AND operator   For example  100  amp   ABC  hashCode     64  location of the bucket for the key  ABC    During the get operation it uses same way to determine the location of bucket for the key  Under the best case each key has unique hashcode and results in a unique bucket for each key  in this case the get method spends time only to determine the bucket location and retrieving the value which is constant O 1     Under the worst case  all the keys have same hashcode and stored in same bucket  this results in traversing through the entire list which leads to O n    In the case of java 8  the Linked List bucket is replaced with a TreeMap if the size grows to more than 8  this reduces the worst case search efficiency to O log n

User · Answer

We ve established that the standard description of hash table lookups being O 1  refers to the average-case expected time  not the strict worst-case performance  For a hash table resolving collisions with chaining  like Java s hashmap  this is technically O 1 a  with a good hash function  where a is the table s load factor  Still constant as long as the number of objects you re storing is no more than a constant factor larger than the table size   It s also been explained that strictly speaking it s possible to construct input that requires O n  lookups for any deterministic hash function  But it s also interesting to consider the worst-case expected time  which is different than average search time  Using chaining this is O 1   the length of the longest chain   for example T log n   log log n  when a 1   If you re interested in theoretical ways to achieve constant time expected worst-case lookups  you can read about dynamic perfect hashing which resolves collisions recursively with another hash table

User · Answer

Only in theoretical case  when hashcodes are always different and bucket for every hash code is also different  the O 1  will exist  Otherwise  it is of constant order i e  on increment of hashmap  its order of search remains constant

User · Answer

A particular feature of a HashMap is that unlike  say  balanced trees  its behavior is probabilistic   In these cases its usually most helpful to talk about complexity in terms of the probability of a worst-case event occurring would be   For a hash map  that of course is the case of a collision with respect to how full the map happens to be   A collision is pretty easy to estimate        pcollision   n   capacity   So a hash map with even a modest number of elements is pretty likely to experience at least one collision   Big O notation allows us to do something more compelling  Observe that for any arbitrary  fixed constant k      O n    O k   n    We can use this feature to improve the performance of the hash map   We could instead think about the probability of at most 2 collisions      pcollision x 2    n   capacity 2   This is much lower   Since the cost of handling one extra collision is irrelevant to Big O performance  we ve found a way to improve performance without actually changing the algorithm   We can generalzie this to      pcollision x k    n   capacity k   And now we can disregard some arbitrary number of collisions and end up with vanishingly tiny likelihood of more collisions than we are accounting for   You could get the probability to an arbitrarily tiny level by choosing the correct k  all without altering the actual implementation of the algorithm   We talk about this by saying that the hash-map has O 1  access with high probability

User · Answer

I know this is an old question  but there s actually a new answer to it   You re right that a hash map isn t really O 1   strictly speaking  because as the number of elements gets arbitrarily large  eventually you will not be able to search in constant time  and O-notation is defined in terms of numbers that can get arbitrarily large      But it doesn t follow that the real time complexity is O n --because there s no rule that says that the buckets have to be implemented as a linear list     In fact  Java 8 implements the buckets as TreeMaps once they exceed a threshold  which makes the actual time O log n

User · Answer

If the number of buckets  call it b  is held constant  the usual case   then lookup is actually O n    As n gets large  the number of elements in each bucket averages n b  If collision resolution is done in one of the usual ways  linked list for example   then lookup is O n b    O n    The O notation is about what happens when n gets larger and larger  It can be misleading when applied to certain algorithms  and hash tables are a case in point  We choose the number of buckets based on how many elements we re expecting to deal with  When n is about the same size as b  then lookup is roughly constant-time  but we can t call it O 1  because O is defined in terms of a limit as n   8

User · Answer

Remember that o 1  does not mean that each lookup only examines a single item - it means that the average number of items checked remains constant w r t  the number of items in the container  So if it takes on average 4 comparisons to find an item in a container with 100 items  it should also take an average of 4 comparisons to find an item in a container with 10000 items  and for any other number of items  there s always a bit of variance  especially around the points at which the hash table rehashes  and when there s a very small number of items    So collisions don t prevent the container from having o 1  operations  as long as the average number of keys per bucket remains within a fixed bound

[java] Is a Java hashmap search really O(1)?

Examples related to java

Examples related to hashmap

Examples related to big-o

Examples related to time-complexity