Hashing a dictionary

Question

For caching purposes I need to generate a cache key from GET arguments which are present in a dict.

Currently I'm using sha1(repr(sorted(my_dict.items()))) (sha1() is a convenience method that uses hashlib internally) but I'm curious if there's a better way.

User · Answer

Using sorted d items    isn t enough to get us a stable repr  Some of the values in d could be dictionaries too  and their keys will still come out in an arbitrary order  As long as all the keys are strings  I prefer to use   json dumps d  sort keys True    That said  if the hashes need to be stable across different machines or Python versions  I m not certain that this is bulletproof  You might want to add the separators and ensure ascii arguments to protect yourself from any changes to the defaults there  I d appreciate comments

User · Answer

To preserve key order  instead of hash str dictionary   or hash json dumps dictionary   I would prefer quick-and-dirty solution   from pprint import pformat h   hash pformat dictionary     It will work even for types like DateTime and more that are not JSON serializable

User · Answer

You can use the maps library to do this  Specifically  maps FrozenMap  import maps fm   maps FrozenMap my dict  hash fm    To install maps  just do   pip install maps     It handles the nested dict case too   import maps fm   maps FrozenMap recurse my dict  hash fm      Disclaimer  I am the author of the maps library

User · Answer

EDIT  If all your keys are strings  then before continuing to read this answer  please see Jack O Connor s significantly simpler  and faster  solution  which also works for hashing nested dictionaries    Although an answer has been accepted  the title of the question is  Hashing a python dictionary   and the answer is incomplete as regards that title    As regards the body of the question  the answer is complete    Nested Dictionaries  If one searches Stack Overflow for how to hash a dictionary  one might stumble upon this aptly titled question  and leave unsatisfied if one is attempting to hash multiply nested dictionaries   The answer above won t work in this case  and you ll have to implement some sort of recursive mechanism to retrieve the hash   Here is one such mechanism   import copy  def make hash o            Makes a hash from a dictionary  list  tuple or set to any level  that contains   only other hashable types  including any lists  tuples  sets  and   dictionaries            if isinstance o   set  tuple  list         return tuple  make hash e  for e in o          elif not isinstance o  dict        return hash o     new o   copy deepcopy o    for k  v in new o items        new o k    make hash v     return hash tuple frozenset sorted new o items         Bonus  Hashing Objects and Classes  The hash   function works great when you hash classes or instances   However  here is one issue I found with hash  as regards objects   class Foo object   pass foo   Foo   print  hash foo     1209812346789 foo a   1 print  hash foo     1209812346789   The hash is the same  even after I ve altered foo   This is because the identity of foo hasn t changed  so the hash is the same   If you want foo to hash differently depending on its current definition  the solution is to hash off whatever is actually changing   In this case  the   dict   attribute   class Foo object   pass foo   Foo   print  make hash foo   dict       1209812346789 foo a   1 print  make hash foo   dict       -78956430974785   Alas  when you attempt to do the same thing with the class itself   print  make hash Foo   dict       TypeError  unhashable type   dict proxy    The class   dict   property is not a normal dictionary   print  type Foo   dict       type  lt  dict proxy  gt    Here is a similar mechanism as previous that will handle classes appropriately   import copy  DictProxyType   type object   dict     def make hash o            Makes a hash from a dictionary  list  tuple or set to any level  that    contains only other hashable types  including any lists  tuples  sets  and   dictionaries   In the case where other kinds of objects  like classes  need    to be hashed  pass in a collection of object attributes that are pertinent     For example  a class can be hashed in this fashion       make hash  cls   dict    cls   name        A function can be hashed like so       make hash  fn   dict    fn   code              if type o     DictProxyType      o2          for k  v in o items          if not k startswith                o2 k    v     o   o2      if isinstance o   set  tuple  list         return tuple  make hash e  for e in o          elif not isinstance o  dict        return hash o     new o   copy deepcopy o    for k  v in new o items        new o k    make hash v     return hash tuple frozenset sorted new o items         You can use this to return a hash tuple of however many elements you d like     -7666086133114527897 print  make hash func   code         -7666086133114527897  3527539  print  make hash  func   code    func   dict          -7666086133114527897  3527539  -509551383349783210  print  make hash  func   code    func   dict    func   name        NOTE  all of the above code assumes Python 3 x   Did not test in earlier versions  although I assume make hash   will work in  say  2 7 2   As far as making the examples work  I do know that   func   code      should be replaced with   func func code

User · Answer

One way to approach the problem is to make a tuple of the dictionary s items   hash tuple my dict items

User · Answer

If your dictionary is not nested  you could make a frozenset with the dict s items and use hash     hash frozenset my dict items       This is much less computationally intensive than generating the JSON string or representation of the dictionary   UPDATE  Please see the comments below  why this approach might not produce a stable result

User · Answer

Updated from 2013 reply     None of the above answers seem reliable to me  The reason is the use of items    As far as I know  this comes out in a machine-dependent order   How about this instead   import hashlib  def dict hash the dict   ignore       if ignore     Sometimes you don t care about some items         interesting   the dict copy           for item in ignore              if item in interesting                  interesting pop item          the dict   interesting     result   hashlib sha1            s    sorted the dict items          hexdigest       return result

User · Answer

I do it like this   hash str my dict

User · Answer

While hash frozenset x items    and hash tuple sorted x items     work  that s doing a lot of work allocating and copying all the key-value pairs  A hash function really should avoid a lot of memory allocation  A little bit of math can help here  The problem with most hash functions is that they assume that order matters  To hash an unordered structure  you need a commutative operation  Multiplication doesn t work well as any element hashing to 0 means the whole product is 0  Bitwise  amp  and   tend towards all 0 s or 1 s  There are two good candidates  addition and xor  from functools import reduce from operator import xor  class hashable dict       def   hash   self           return reduce xor  map hash  self items     0         Alternative     def   hash   self           return sum map hash  self items      One point  xor works  in part  because dict guarantees keys are unique  And sum works because Python will bitwise truncate the results  If you want to hash a multiset  sum is preferable  With xor   a  would hash to the same value as  a  a  a  because x   x   x   x  If you really need the guarantees that SHA makes  this won t work for you  But to use a dictionary in a set  this will work fine  Python containers are resiliant to some collisions  and the underlying hash functions are pretty good

User · Answer

Here is a clearer solution   def freeze o     if isinstance o dict       return frozenset   k freeze v  for k v in o items    items       if isinstance o list       return tuple  freeze v  for v in o      return o   def make hash o               makes a hash out of anything that contains only list dict and hashable types including string and numeric types             return hash freeze o

User · Answer

The code below avoids using the Python hash   function because it will not provide hashes that are consistent across restarts of Python  see hash function in Python 3 3 returns different results between sessions   make hashable   will convert the object into nested tuples and make hash sha256   will also convert the repr   to a base64 encoded SHA256 hash   import hashlib import base64  def make hash sha256 o       hasher   hashlib sha256       hasher update repr make hashable o   encode        return base64 b64encode hasher digest    decode    def make hashable o       if isinstance o   tuple  list            return tuple  make hashable e  for e in o        if isinstance o  dict           return tuple sorted  k make hashable v   for k v in o items          if isinstance o   set  frozenset            return tuple sorted make hashable e  for e in o        return o  o   dict x 1 b 2 c  3 4 5  d  6 7   print make hashable o        b   2     c    3  4  5      d    6  7      x   1    print make hash sha256 o     fyt gK6D24H9Ugexw g3lbqnKZ0JAcgtNW rXIDeU2Y

User · Answer

Answers from this thread with the highest upvotes didn't work for me as their hashing functions give different results on different machines due to PYTHOPYTHONHASHSEED.

I adjusted all the hints from this thread and came up with a solution that works for me.

import collections
import hashlib
import json


def simplify_object(o):
    if isinstance(o, dict):
        ordered_dict = collections.OrderedDict(sorted(o.items()))
        for k, v in ordered_dict.items():
            v = simplify_object(v)
            ordered_dict[str(k)] = v
        o = ordered_dict
    elif isinstance(o, (list, tuple, set)):
        o = [simplify_object(el) for el in o]
    else:
        o = str(o).strip()
    return o


def make_hash(o):
    o = simplify_object(o)
    bytes_val = json.dumps(o, sort_keys=True, ensure_ascii=True, default=str)
    hash_val = hashlib.sha1(bytes_val.encode()).hexdigest()
    return hash_val

User · Answer

You could use the third-party frozendict module to freeze your dict and make it hashable   from frozendict import frozendict my dict   frozendict my dict    For handling nested objects  you could go with   import collections abc  def make hashable x       if isinstance x  collections abc Hashable           return x     elif isinstance x  collections abc Sequence           return tuple make hashable xi  for xi in x      elif isinstance x  collections abc Set           return frozenset make hashable xi  for xi in x      elif isinstance x  collections abc Mapping           return frozendict  k  make hashable v  for k  v in x items         else          raise TypeError  Don t know how to make    objects hashable  format type x    name       If you want to support more types  use functools singledispatch  Python 3 7     functools singledispatch def make hashable x       raise TypeError  Don t know how to make    objects hashable  format type x    name       make hashable register def   x  collections abc Hashable       return x   make hashable register def   x  collections abc Sequence       return tuple make hashable xi  for xi in x    make hashable register def   x  collections abc Set       return frozenset make hashable xi  for xi in x    make hashable register def   x  collections abc Mapping       return frozendict  k  make hashable v  for k  v in x items        add your own types here

[python] Hashing a dictionary?

The answer is

Examples related to python

Examples related to hash

Examples related to dictionary

Tags