What integer hash function are good that accepts an integer hash key

Question

User · Answer

There s a nice overview over some hash algorithms at Eternally Confuzzled  I d recommend Bob Jenkins  one-at-a-time hash which quickly reaches avalanche and therefore can be used for efficient hash table lookup

User · Answer

Depends on how your data is distributed  For a simple counter  the simplest function  f i    i   will be good  I suspect optimal  but I can t prove it

User · Answer

This page lists some simple hash functions that tend to decently in general  but any simple hash has pathological cases where it doesn t work well

User · Answer

I have been using splitmix64  pointed in Thomas Mueller s answer  ever since I found this thread  However  I recently stumbled upon Pelle Evensen s rrxmrrxmsx 0  which yielded tremendously better statistical distribution than the original MurmurHash3 finalizer and its successors  splitmix64 and other mixes   Here is the code snippet in C    include  lt stdint h gt   static inline uint64 t ror64 uint64 t v  int r        return  v  gt  gt  r     v  lt  lt   64 - r       uint64 t rrxmrrxmsx 0 uint64 t v        v    ror64 v  25    ror64 v  50       v    0xA24BAED4963EE407UL      v    ror64 v  24    ror64 v  49       v    0x9FB21C651E98DF25UL      return v   v  gt  gt  28      Pelle also provides an in-depth analysis of the 64-bit mixer used in the final step of MurmurHash3 and the more recent variants

User · Answer

Fast and good hash functions can be composed from fast permutations with lesser qualities  like  multiplication with an uneven integer binary rotations xorshift  To yield a hashing function with superior qualities  like demonstrated with PCG for random number generation  This is in fact also the recipe rrxmrrxmsx 0 and murmur hash are using  knowingly or unknowingly  I personally found uint64 t xorshift const uint64 t amp  n int i     return n  n gt  gt i     uint64 t hash const uint64 t amp  n     uint64 t p   0x5555555555555555ull     pattern of alternating 0 and 1   uint64 t c   17316035218449499591ull    random uneven integer constant     return c xorshift p xorshift n 32  32      to be good enough  A good hash function should  be bijective to not loose information  if possible and have the least collisions cascade as much and as evenly as possible  i e  each input bit should flip every output bit with probability 0 5   Let s first look at the identity function  It satisfies 1  but not 2     Input bit n determines output bit n with a correlation of 100   red  and no others  they are therefore blue  giving a perfect red line across  A xorshift n 32  is not much better  yielding one and half a line  Still satisfying 1   because it is invertible with a second application   A multiplication with an unsigned integer   quot Knuth s multiplicative method quot   is much better  cascading more strongly and flipping more output bits with a probability of 0 5  which is what you want  in green  It satisfies 1  as for each uneven integer there is a multiplicative inverse   Combining the two gives the following output  still satisfying 1  as the composition of two bijective functions yields another bijective function   A second application of multiplication and xorshift will yield the following   Or you can use Galois field multiplications like GHash  they have become reasonably fast on modern CPUs and have superior qualities in one step     uint64 t const inline gfmul const uint64 t amp  i const uint64 t amp  j                     m128i I   I 0   i                                                                   m128i J   J 0   j                                                                   m128i M   M 0   0xb000000000000000ull                                               m128i X    mm clmulepi64 si128 I J 0                                                m128i A    mm clmulepi64 si128 X M 0                                                m128i B    mm clmulepi64 si128 A M 0                                              return A 0  A 1  B 1  X 0  X 1

User · Answer

For random hash values  some engineers said golden ratio prime number 2654435761  is a bad choice  with my testing results  I found that it s not true  instead  2654435761 distributes the hash values pretty good    define MCR HashTableSize 2 10  unsigned int Hash UInt GRPrimeNumber unsigned int key      key   key 2654435761  amp   MCR HashTableSize - 1    return key      The hash table size must be a power of two   I have written a test program to evaluate many hash functions for integers  the results show that GRPrimeNumber is a pretty good choice   I have tried    total data entry number   total bucket number   2  3  4  where total bucket number   hash table size  map hash value domain into bucket index domain  that is  convert hash value into bucket index by Logical And Operation with  hash table size - 1   as shown in Hash UInt GRPrimeNumber    calculate the collision number of each bucket  record the bucket that has not been mapped  that is  an empty bucket  find out the max collision number of all buckets  that is  the longest chain length    With my testing results  I found that Golden Ratio Prime Number always has the fewer empty buckets or zero empty bucket and the shortest collision chain length   Some hash functions for integers are claimed to be good  but the testing results show that when the total data entry   total bucket number   3  the longest chain length is bigger than 10 max collision number   10   and many buckets are not mapped empty buckets   which is very bad  compared with the result of zero empty bucket and longest chain length 3 by Golden Ratio Prime Number Hashing    BTW  with my testing results  I found one version of shifting-xor hash functions is pretty good It s shared by mikera    unsigned int Hash UInt M3 unsigned int key      key     key  lt  lt  13     key     key  gt  gt  17         key     key  lt  lt  5      return key

User · Answer

The answer depends on a lot of things like    Where do you intend to employ it  What are you trying to do with the hash  Do you need a crytographically secure hash function    I suggest that you take a look at the Merkle-Damgard family of hash functions like SHA-1 etc

User · Answer

I found the following algorithm provides a very good statistical distribution  Each input bit affects each output bit with about 50  probability  There are no collisions  each input results in a different output   The algorithm is fast except if the CPU doesn t have a built-in integer multiplication unit  C code  assuming int is 32 bit  for Java  replace  gt  gt  with  gt  gt  gt  and remove unsigned    unsigned int hash unsigned int x        x     x  gt  gt  16    x    0x45d9f3b      x     x  gt  gt  16    x    0x45d9f3b      x    x  gt  gt  16    x      return x      The magic number was calculated using a special multi-threaded test program that ran for many hours  which calculates the avalanche effect  the number of output bits that change if a single input bit is changed  should be nearly 16 on average   independence of output bit changes  output bits should not depend on each other   and the probability of a change in each output bit if any input bit is changed  The calculated values are better than the 32-bit finalizer used by MurmurHash  and nearly as good  not quite  as when using AES  A slight advantage is that the same constant is used twice  it did make it slightly faster the last time I tested  not sure if it s still the case    You can reverse the process  get the input value from the hash  if you replace the 0x45d9f3b with 0x119de1f3  the multiplicative inverse    unsigned int unhash unsigned int x        x     x  gt  gt  16    x    0x119de1f3      x     x  gt  gt  16    x    0x119de1f3      x    x  gt  gt  16    x      return x      For 64-bit numbers  I suggest to use the following  even thought it might not be the fastest  This one is based on splitmix64  which seems to be based on the blog article Better Bit Mixing  mix 13     uint64 t hash uint64 t x        x    x    x  gt  gt  30     UINT64 C 0xbf58476d1ce4e5b9       x    x    x  gt  gt  27     UINT64 C 0x94d049bb133111eb       x   x    x  gt  gt  31       return x      For Java  use long  add L to the constant  replace  gt  gt  with  gt  gt  gt  and remove unsigned  In this case  reversing is more complicated   uint64 t unhash uint64 t x        x    x    x  gt  gt  31     x  gt  gt  62     UINT64 C 0x319642b2d24d8ec3       x    x    x  gt  gt  27     x  gt  gt  54     UINT64 C 0x96de1b173f119089       x   x    x  gt  gt  30     x  gt  gt  60       return x      Update  You may also want to look at the Hash Function Prospector project  where other  possibly better  constants are listed

User · Answer

Knuth s multiplicative method   hash i  i 2654435761 mod 2 32   In general  you should pick a multiplier that is in the order of your hash size  2 32 in the example  and has no common factors with it  This way the hash function covers all your hash space uniformly   Edit  The biggest disadvantage of this hash function is that it preserves divisibility  so if your integers are all divisible by 2 or by 4  which is not uncommon   their hashes will be too  This is a problem in hash tables - you can end up with only 1 2 or 1 4 of the buckets being used

User · Answer

32-bits multiplicative method  very fast  see  rafal   define hash32 x    x  2654435761   define H BITS 24    Hashtable size  define H SHIFT  32-H BITS  unsigned hashtab 1 lt  lt H BITS          unsigned slot   hash32 x   gt  gt  H SHIFT  32-bits and 64-bits  good distribution  at   MurmurHash Integer Hash Function

User · Answer

I don t think we can say that a hash function is  good  without knowing your data in advance   and without knowing what you re going to do with it   There are better data structures than hash tables for unknown data sizes  I m assuming you re doing the hashing for a hash table here    I would personally use a hash table when I Know I have a  finite  number of elements that are needing stored in a limited amount of memory  I would try and do a quick statistical analysis on my data  see how it is distributed etc before I start thinking about my hash function

[c] What integer hash function are good that accepts an integer hash key?

Examples related to c

Examples related to algorithm

Examples related to hash