What is a good Hash Function

Question

What is a good Hash function  I saw a lot of hash function and applications in my data structures courses in college  but I mostly got that it s pretty hard to make a good hash function  As a rule of thumb to avoid collisions my professor said that   function Hash key    return key mod PrimeNumber end    mod is the   operator in C and similar languages   with the prime number to be the size of the hash table  I get that is a somewhat good function to avoid collisions and a fast one  but how can I make a better one  Is there better hash functions for string keys against numeric keys

User · Accepted Answer

For doing  normal  hash table lookups on basically any kind of data - this one by Paul Hsieh is the best I ve ever used   http   www azillionmonkeys com qed hash html  If you care about cryptographically secure or anything else more advanced  then YMMV   If you just want a kick ass general purpose hash function for a hash table lookup  then this is what you re looking for

User · Answer

I d say that the main rule of thumb is not to roll your own  Try to use something that has been thoroughly tested  e g   SHA-1 or something along those lines

User · Answer

What you re saying here is you want to have one that uses has collision resistance  Try using SHA-2  Or try using a  good  block cipher in a one way compression function  never tried that before   like AES in Miyaguchi-Preenel mode  The problem with that is that you need to    1  have an IV  Try using the first 256 bits of the fractional parts of Khinchin s constant or something like that  2  have a padding scheme  Easy  Barrow it from a hash like MD5 or SHA-3  Keccak  pronounced  ket-chak     If you don t care about the security  a few others said this   look at FNV or lookup2 by Bob Jenkins  actually I m the first one who reccomends lookup2  Also try MurmurHash  it s fast  check this   16 cpb

User · Answer

A good hash function has the following properties    Given a hash of a message it is computationally infeasible for an attacker to find another message such that their hashes are identical  Given a pair of message  m  and m  it is computationally infeasible to find two such that that h m    h m     The two cases are not the same  In the first case  there is a pre-existing hash that you re trying to find a collision for  In the second case  you re trying to find any two messages that collide  The second task is significantly easier due to the birthday  paradox    Where performance is not that great an issue  you should always use a secure hash function  There are very clever attacks that can be performed by forcing collisions in a hash  If you use something strong from the outset  you ll secure yourself against these   Don t use MD5 or SHA-1 in new designs  Most cryptographers  me included  would consider them broken  The principle source of weakness in both of these designs is that the second property  which I outlined above  does not hold for these constructions  If an attacker can generate two messages  m and m   that both hash to the same value they can use these messages against you  SHA-1 and MD5 also suffer from message extension attacks  which can fatally weaken your application if you re not careful   A more modern hash such as Whirpool is a better choice  It does not suffer from these message extension attacks and uses the same mathematics as AES uses to prove security against a variety of attacks    Hope that helps

User · Answer

A good hash function should  be bijective to not loose information  where possible  and have the least collisions cascade as much and as evenly as possible  i e  each input bit should flip every output bit with probability 0 5 and without obvious patterns  if used in a cryptographic context there should not exist an efficient way to invert it   A prime number modulus does not satisfy any of these points  It is simply insufficient  It is often better than nothing  but it s not even fast  Multiplying with an unsigned integer and taking a power-of-two modulus distributes the values just as well  that is not well at all  but with only about 2 cpu cycles it is much faster than the 15 to 40 a prime modulus will take  yes integer division really is that slow   To create a hash function that is fast and distributes the values well the best option is to compose it from fast permutations with lesser qualities like they did with PCG for random number generation  Useful permutations  among others  are   multiplication with an uneven integer binary rotations xorshift  Following this recipe we can create our own hash function or we take splitmix which is tested and well accepted  If cryptographic qualities are needed I would highly recommend to use a function of the sha family  which is well tested and standardised  but for educational purposes this is how you would make one  First you take a good non-cryptographic hash function  then you apply a one-way function like exponentiation on a prime field or k many applications of  n  n 1  2  mod 2 k interspersed with an xorshift when k is the number of bits in the resulting hash

User · Answer

There s no such thing as a    good hash function    for universal hashes  ed  yes  I know there s such a thing as    universal hashing    but that s not what I meant   Depending on the context different criteria determine the quality of a hash  Two people already mentioned SHA  This is a cryptographic hash and it isn t at all good for hash tables which you probably mean   Hash tables have very different requirements  But still  finding a good hash function universally is hard because different data types expose different information that can be hashed  As a rule of thumb it is good to consider all information a type holds equally  This is not always easy or even possible  For reasons of statistics  and hence collision   it is also important to generate a good spread over the problem space  i e  all possible objects  This means that when hashing numbers between 100 and 1050 it s no good to let the most significant digit play a big part in the hash because for   90  of the objects  this digit will be 0  It s far more important to let the last three digits determine the hash   Similarly  when hashing strings it s important to consider all characters     except when it s known in advance that the first three characters of all strings will be the same  considering these then is a waste   This is actually one of the cases where I advise to read what Knuth has to say in The Art of Computer Programming  vol  3  Another good read is Julienne Walker s The Art of Hashing

User · Answer

This is an example of a good one and also an example of why you would never want to write one  It is a Fowler   Noll   Vo  FNV  Hash which is equal parts computer science genius and pure voodoo   unsigned fnv hash 1a 32   void  key  int len         unsigned char  p   key      unsigned h   0x811c9dc5      int i       for   i   0  i  lt  len  i           h     h   p i      0x01000193      return h     unsigned long long fnv hash 1a 64   void  key  int len         unsigned char  p   key      unsigned long long h   0xcbf29ce484222325ULL      int i       for   i   0  i  lt  len  i           h     h   p i      0x100000001b3ULL      return h      Edit     Landon Curt Noll recommends on his site the FVN-1A algorithm over the original FVN-1 algorithm  The improved algorithm better disperses the last byte in the hash  I adjusted the algorithm accordingly

User · Answer

There are two major purposes of hashing functions    to disperse data points uniformly into n bits  to securely identify the input data    It s impossible to recommend a hash without knowing what you re using it for   If you re just making a hash table in a program  then you don t need to worry about how reversible or hackable the algorithm is    SHA-1 or AES is completely unnecessary for this  you d be better off using a variation of FNV  FNV achieves better dispersion  and thus fewer collisions  than a simple prime mod like you mentioned  and it s more adaptable to varying input sizes   If you re using the hashes to hide and authenticate public information  such as hashing a password  or a document   then you should use one of the major hashing algorithms vetted by public scrutiny  The Hash Function Lounge is a good place to start

[algorithm] What is a good Hash Function?

Examples related to algorithm

Examples related to language-agnostic

Examples related to hash