[python] What does hash do in python?

I saw an example of code that where hash function is applied to a tuple. As a result it returns a negative integer. I wonder what does this function do? Google does not help. I found a page that explains how hash is calculated but it does not explain why we need this function.

This question is related to python hash

The answer is


You can use the Dictionary data type in python. It's very very similar to the hash—and it also supports nesting, similar to the to nested hash.

Example:

dict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'}
dict['Age'] = 8; # update existing entry
dict['School'] = "DPS School" # Add new entry

print ("dict['Age']: ", dict['Age'])
print ("dict['School']: ", dict['School'])

For more information, please reference this tutorial on the dictionary data type.


TL;DR:

Please refer to the glossary: hash() is used as a shortcut to comparing objects, an object is deemed hashable if it can be compared to other objects. that is why we use hash(). It's also used to access dict and set elements which are implemented as resizable hash tables in CPython.

Technical considerations

  • usually comparing objects (which may involve several levels of recursion) is expensive.
  • preferably, the hash() function is an order of magnitude (or several) less expensive.
  • comparing two hashes is easier than comparing two objects, this is where the shortcut is.

If you read about how dictionaries are implemented, they use hash tables, which means deriving a key from an object is a corner stone for retrieving objects in dictionaries in O(1). That's however very dependent on your hash function to be collision-resistant. The worst case for getting an item in a dictionary is actually O(n).

On that note, mutable objects are usually not hashable. The hashable property means you can use an object as a key. If the hash value is used as a key and the contents of that same object change, then what should the hash function return? Is it the same key or a different one? It depends on how you define your hash function.

Learning by example:

Imagine we have this class:

>>> class Person(object):
...     def __init__(self, name, ssn, address):
...         self.name = name
...         self.ssn = ssn
...         self.address = address
...     def __hash__(self):
...         return hash(self.ssn)
...     def __eq__(self, other):
...         return self.ssn == other.ssn
... 

Please note: this is all based on the assumption that the SSN never changes for an individual (don't even know where to actually verify that fact from authoritative source).

And we have Bob:

>>> bob = Person('bob', '1111-222-333', None)

Bob goes to see a judge to change his name:

>>> jim = Person('jim bo', '1111-222-333', 'sf bay area')

This is what we know:

>>> bob == jim
True

But these are two different objects with different memory allocated, just like two different records of the same person:

>>> bob is jim
False

Now comes the part where hash() is handy:

>>> dmv_appointments = {}
>>> dmv_appointments[bob] = 'tomorrow'

Guess what:

>>> dmv_appointments[jim] #?
'tomorrow'

From two different records you are able to access the same information. Now try this:

>>> dmv_appointments[hash(jim)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 9, in __eq__
AttributeError: 'int' object has no attribute 'ssn'
>>> hash(jim) == hash(hash(jim))
True

What just happened? That's a collision. Because hash(jim) == hash(hash(jim)) which are both integers btw, we need to compare the input of __getitem__ with all items that collide. The builtin int does not have an ssn attribute so it trips.

>>> del Person.__eq__
>>> dmv_appointments[bob]
'tomorrow'
>>> dmv_appointments[jim]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: <__main__.Person object at 0x7f611bd37110>

In this last example, I show that even with a collision, the comparison is performed, the objects are no longer equal, which means it successfully raises a KeyError.


The Python docs for hash() state:

Hash values are integers. They are used to quickly compare dictionary keys during a dictionary lookup.

Python dictionaries are implemented as hash tables. So any time you use a dictionary, hash() is called on the keys that you pass in for assignment, or look-up.

Additionally, the docs for the dict type state:

Values that are not hashable, that is, values containing lists, dictionaries or other mutable types (that are compared by value rather than by object identity) may not be used as keys.


The hash is used by dictionaries and sets to quickly look up the object. A good starting point is Wikipedia's article on hash tables.