We are currently dealing with hash function in my class. Our instructor asked us to a hash function on the internet to compare to the two we have used in our code.
The first one:
int HashTable::hash (string word)
// POST: the index of entry is returned
{ int sum = 0;
for (int k = 0; k < word.length(); k++)
sum = sum + int(word[k]);
return sum % SIZE;
}
Second:
int HashTable::hash (string word)
{
int seed = 131;
unsigned long hash = 0;
for(int i = 0; i < word.length(); i++)
{
hash = (hash * seed) + word[i];
}
return hash % SIZE;
}
Where SIZE is 501 (The size of the hash table) and the input is coming from a text file of 20,000+ words.
I saw this question with a few code examples but wasn't exactly sure what to be looking for in a hash function. If I understand correctly, in my case, a hash takes an input (string) and does a math calculation to assign the string a number and inserts it in a table. This process is done to increase the speed of searching the list?
If my logic is sound, does anyone have a good example or a resource showing a different hash function that involves a string? Or even the process of writing my own efficient hash function.
Hash functions for algorithmic use have usually 2 goals, first they have to be fast, second they have to evenly distibute the values across the possible numbers. The hash function also required to give the all same number for the same input value.
if your values are strings, here are some examples for bad hash functions:
string[0]
- the ASCII characters a-Z are way more often then othersstring.lengh()
- the most probable value is 1Good hash functions tries to use every bit of the input while keeping the calculation time minimal. If you only need some hash code, try to multiply the bytes with prime numbers, and sum them.
-- The way to go these days --
Use SipHash. For your own protection.
-- Old and Dangerous --
unsigned int RSHash(const std::string& str)
{
unsigned int b = 378551;
unsigned int a = 63689;
unsigned int hash = 0;
for(std::size_t i = 0; i < str.length(); i++)
{
hash = hash * a + str[i];
a = a * b;
}
return (hash & 0x7FFFFFFF);
}
unsigned int JSHash(const std::string& str)
{
unsigned int hash = 1315423911;
for(std::size_t i = 0; i < str.length(); i++)
{
hash ^= ((hash << 5) + str[i] + (hash >> 2));
}
return (hash & 0x7FFFFFFF);
}
Ask google for "general purpose hash function"
Use boost::hash
#include <boost\functional\hash.hpp>
...
std::string a = "ABCDE";
size_t b = boost::hash_value(a);
Java's String
implements hashCode like this:
public int hashCode()
Returns a hash code for this string. The hash code for a String object is computed as
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)
So something like this:
int HashTable::hash (string word) {
int result = 0;
for(size_t i = 0; i < word.length(); ++i) {
result += word[i] * pow(31, i);
}
return result;
}
Source: Stackoverflow.com