Time complexity of accessing a Python dict

Question

I am writing a simple Python program     My program seems to suffer from linear access to dictionaries   its run-time grows exponentially even though the algorithm is quadratic  I use a dictionary to memoize values  That seems to be a bottleneck     The values I m hashing are tuples of points  Each point is   x y   0  lt   x y  lt   50 Each key in the dictionary is  A tuple of 2-5 points    x1 y1   x2 y2   x3 y3   x4 y4    The keys are read many times more often than they are written   Am I correct that python dicts suffer from linear access times with such inputs     As far as I know  sets have guaranteed logarithmic access times  How can I simulate dicts using sets or something similar  in Python     edit As per request  here s a  simplified  version of the memoization function   def memoize fun       memoized          def memo  args           key   args         if not key in memoized              memoized key    fun  args          return memoized key      return memo

User · Accepted Answer

See Time Complexity  The python dict is a hashmap  its worst case is therefore O n  if the hash function is bad and results in a lot of collisions  However that is a very rare case where every item added has the same hash and so is added to the same chain which for a major Python implementation would be extremely unlikely   The average time complexity is of course O 1    The best method would be to check and take a look at the hashs of the objects you are using  The CPython Dict uses int PyObject Hash  PyObject  o  which is the equivalent of hash o    After a quick check  I have not yet managed to find two tuples that hash to the same value  which would indicate that the lookup is O 1   l      for x in range 0  50       for y in range 0  50           if hash  x y   in l              print  Fail      x y          l append hash  x y    print  Test Finished    CodePad  Available for 24 hours

User · Answer

You are not correct  dict access is unlikely to be your problem here  It is almost certainly O 1   unless you have some very weird inputs or a very bad hashing function  Paste some sample code from your application for a better diagnosis

User · Answer

It would be easier to make suggestions if you provided example code and data    Accessing the dictionary is unlikely to be a problem as that operation is O 1  on average  and O N  amortized worst case  It s possible that the built-in hashing functions are experiencing collisions for your data  If you re having problems with has the built-in hashing function  you can provide your own      Python s dictionary implementation   reduces the average complexity of   dictionary lookups to O 1  by   requiring that key objects provide a    hash  function  Such a hash function   takes the information in a key object   and uses it to produce an integer    called a hash value  This hash value   is then used to determine which    bucket  this  key  value  pair should   be placed into     You can overwrite the   hash   method in your class to implement a custom hash function like this   def   hash   self           return hash str self     Depending on what your data actually looks like  you might be able to come up with a faster hash function that has fewer collisions than the standard function  However  this is unlikely  See the Python Wiki page on Dictionary Keys for more information

User · Answer

As others have pointed out  accessing dicts in Python is fast   They are probably the best-oiled data structure in the language  given their central role   The problem lies elsewhere   How many tuples are you memoizing   Have you considered the memory footprint   Perhaps you are spending all your time in the memory allocator or paging memory

User · Answer

To answer your specific questions   Q1   quot Am I correct that python dicts suffer from linear access times with such inputs  quot   A1  If you mean that average lookup time is O N  where N is the number of entries in the dict  then it is highly likely that you are wrong  If you are correct  the Python community would very much like to know under what circumstances you are correct  so that the problem can be mitigated or at least warned about  Neither  quot sample quot  code nor  quot simplified quot  code are useful  Please show actual code and data that reproduce the problem  The code should be instrumented with things like number of dict items and number of dict accesses for each P where P is the number of points in the key  2  lt   P  lt   5   Q2   quot As far as I know  sets have guaranteed logarithmic access times  How can I simulate dicts using sets or something similar  in Python  quot   A2  Sets have guaranteed logarithmic access times in what context  There is no such guarantee for Python implementations  Recent CPython versions in fact use a cut-down dict implementation  keys only  no values   so the expectation is average O 1  behaviour  How can you simulate dicts with sets or something similar in any language  Short answer  with extreme difficulty  if you want any functionality beyond dict has key key

User · Answer

My program seems to suffer from linear access to dictionaries  its run-time grows exponentially even though the algorithm is quadratic  I use a dictionary to memoize values  That seems to be a bottleneck   This is evidence of a bug in your memoization method

[python] Time complexity of accessing a Python dict

Examples related to python

Examples related to hash

Examples related to dictionary

Examples related to complexity-theory