What is the best way to implement nested dictionaries

Question

I have a data structure which essentially amounts to a nested dictionary  Let s say it looks like this     new jersey     mercer county     plumbers   3                                     programmers   81                    middlesex county     programmers   81                                        salesmen   62      new york     queens county     plumbers   9                                   salesmen   36      Now  maintaining and creating this is pretty painful  every time I have a new state county profession I have to create the lower layer dictionaries via obnoxious try catch blocks  Moreover  I have to create annoying nested iterators if I want to go over all the values    I could also use tuples as keys  like such      new jersey    mercer county    plumbers    3     new jersey    mercer county    programmers    81     new jersey    middlesex county    programmers    81     new jersey    middlesex county    salesmen    62     new york    queens county    plumbers    9     new york    queens county    salesmen    36    This makes iterating over the values very simple and natural  but it is more syntactically painful to do things like aggregations and looking at subsets of the dictionary  e g  if I just want to go state-by-state    Basically  sometimes I want to think of a nested dictionary as a flat dictionary  and sometimes I want to think of it indeed as a complex hierarchy  I could wrap this all in a class  but it seems like someone might have done this already  Alternatively  it seems like there might be some really elegant syntactical constructions to do this    How could I do this better   Addendum  I m aware of setdefault   but it doesn t really make for clean syntax   Also  each sub-dictionary you create still needs to have setdefault   manually set

User · Answer

You could create a YAML file and read it in using PyYaml.

Step 1: Create a YAML file, "employment.yml":

new jersey:
  mercer county:
    pumbers: 3
    programmers: 81
  middlesex county:
    salesmen: 62
    programmers: 81
new york:
  queens county:
    plumbers: 9
    salesmen: 36

Step 2: Read it in Python

import yaml
file_handle = open("employment.yml")
my_shnazzy_dictionary = yaml.safe_load(file_handle)
file_handle.close()

and now my_shnazzy_dictionary has all your values. If you needed to do this on the fly, you can create the YAML as a string and feed that into yaml.safe_load(...).

User · Answer

class AutoVivification dict          Implementation of perl s autovivification feature         def   getitem   self  item           try              return dict   getitem   self  item          except KeyError              value   self item    type self                return value   Testing   a   AutoVivification    a 1  2  3    4 a 1  3  3    5 a 1  2   test     6  print a   Output    1   2    test   6  3  4   3   3  5

User · Answer

You can use recursion in lambdas and defaultdict  no need to define names   a   defaultdict  lambda f  f f   lambda g  lambda defaultdict g g       Here s an example    gt  gt  gt  a  new jersey    mercer county    plumbers   3  gt  gt  gt  a  new jersey    middlesex county    programmers   81  gt  gt  gt  a  new jersey    mercer county    programmers   81  gt  gt  gt  a  new jersey    middlesex county    salesmen   62  gt  gt  gt  a defaultdict  lt function   main    lt lambda gt  gt             new jersey   defaultdict  lt function   main    lt lambda gt  gt                          mercer county   defaultdict  lt function   main    lt lambda gt  gt                                       plumbers   3   programmers   81                           middlesex county   defaultdict  lt function   main    lt lambda gt  gt                                       programmers   81   salesmen   62

User · Answer

Since you have a star-schema design  you might want to structure it more like a relational table and less like a dictionary   import collections  class Jobs  object        def   init    self  state  county  title  count            self state  state         self count  county         self title  title         self count  count  facts         Jobs   new jersey    mercer county    plumbers   3             def groupBy  facts  name        total  collections defaultdict  int       for f in facts          key  getattr  f  name           total key     f count   That kind of thing can go a long way to creating a data warehouse-like design without the SQL overheads

User · Answer

For easy iterating over your nested dictionary  why not just write a simple generator   def each job my dict       for state  a in my dict items            for county  b in a items                for job  value in b items                    yield                        state     state                       county    county                       job       job                       value     value                     So then  if you have your compilicated nested dictionary  iterating over it becomes simple   for r in each job my dict       print  There are  d  s in  s   s     r  value    r  job    r  county    r  state      Obviously your generator can yield whatever format of data is useful to you   Why are you using try catch blocks to read the tree  It s easy enough  and probably safer  to query whether a key exists in a dict before trying to retrieve it  A function using guard clauses might look like this   if not my dict has key  new jersey        return False  nj dict   my dict  new jersey         Or  a perhaps somewhat verbose method  is to use the get method   value   my dict get  new jersey       get  middlesex county       get  salesmen   0    But for a somewhat more succinct way  you might want to look at using a collections defaultdict  which is part of the standard library since python 2 5   import collections  def state struct    return collections defaultdict county struct  def county struct    return collections defaultdict job struct  def job struct    return 0  my dict   collections defaultdict state struct   print my dict  new jersey    middlesex county    salesmen     I m making assumptions about the meaning of your data structure here  but it should be easy to adjust for what you actually want to do

User · Answer

collections defaultdict can be sub-classed to make a nested dict   Then add any useful iteration methods to that class    gt  gt  gt  from collections import defaultdict  gt  gt  gt  class nesteddict defaultdict       def   init   self           defaultdict   init   self  nesteddict      def walk self           for key  value in self iteritems                if isinstance value  nesteddict                   for tup in value walk                        yield  key     tup             else                  yield key  value    gt  gt  gt  nd   nesteddict    gt  gt  gt  nd  new jersey    mercer county    plumbers     3  gt  gt  gt  nd  new jersey    mercer county    programmers     81  gt  gt  gt  nd  new jersey    middlesex county    programmers     81  gt  gt  gt  nd  new jersey    middlesex county    salesmen     62  gt  gt  gt  nd  new york    queens county    plumbers     9  gt  gt  gt  nd  new york    queens county    salesmen     36  gt  gt  gt  for tup in nd walk        print tup     new jersey    mercer county    programmers   81    new jersey    mercer county    plumbers   3    new jersey    middlesex county    programmers   81    new jersey    middlesex county    salesmen   62    new york    queens county    salesmen   36    new york    queens county    plumbers   9

User · Answer

I have a similar thing going   I have a lot of cases where I do   thedict      for item in   foo    bar    baz      mydict   thedict get item        mydict   get value for item    thedict item    mydict   But going many levels deep   It s the   get item       that s the key as it ll make another dictionary if there isn t one already   Meanwhile  I ve been thinking of ways to deal with  this better   Right now  there s a lot of  value   mydict get  foo       get  bar       get  baz   0    So instead  I made   def dictgetter thedict  default   args     totalargs   len args    for i arg in enumerate args       if i 1    totalargs        thedict   thedict get arg  default      else        thedict   thedict get arg        return thedict   Which has the same effect if you do   value   dictgetter mydict  0   foo    bar    baz     Better   I think so

User · Answer

I find setdefault quite useful  It checks if a key is present and adds it if not   d      d setdefault  new jersey       setdefault  mercer county        plumbers     3   setdefault always returns the relevant key  so you are actually updating the values of  d  in place   When it comes to iterating  I m sure you could write a generator easily enough if one doesn t already exist in Python   def iterateStates d         Let s count up the total number of  plumbers     dentists    etc        across all counties and states     job totals             I guess this is the annoying nested stuff you were talking about      for  state  counties  in d iteritems            for  county  jobs  in counties iteritems                for  job  num  in jobs iteritems                      If job isn t already in job totals  default it to zero                 job totals job    job totals get job  0    num        Now return an iterator of  job  number  tuples     return job totals iteritems      Display all jobs for  job  num  in iterateStates d       print  There are  d  s in total     job  num

User · Answer

Just because I haven t seen one this small  here s a dict that gets as nested as you like  no sweat     yo dawg  i heard you liked dicts                                                                       def yodict        return defaultdict yodict

User · Answer

As others have suggested  a relational database could be more useful to you   You can use a in-memory sqlite3 database as a data structure to create tables and then query them   import sqlite3  c   sqlite3 Connection   memory    c execute  CREATE TABLE jobs  state  county  title  count     c executemany  insert into jobs values                        New Jersey    Mercer County       Programmers   81         New Jersey    Mercer County       Plumbers       3         New Jersey    Middlesex County    Programmers   81         New Jersey    Middlesex County    Salesmen      62         New York      Queens County       Salesmen      36         New York      Queens County       Plumbers       9         some example queries print list c execute  SELECT   FROM jobs WHERE county    Queens County     print list c execute  SELECT SUM count  FROM jobs WHERE title    Programmers       This is just a simple example   You could define separate tables for states  counties and job titles

User · Answer

What is the best way to implement nested dictionaries in Python   This is a bad idea  don t do it  Instead  use a regular dictionary and use dict setdefault where apropos  so when keys are missing under normal usage you get the expected KeyError  If you insist on getting this behavior  here s how to shoot yourself in the foot  Implement   missing   on a dict subclass to set and return a new instance  This approach has been available  and documented  since Python 2 5  and  particularly valuable to me  it pretty prints just like a normal dict  instead of the ugly printing of an autovivified defaultdict  class Vividict dict       def   missing   self  key           value   self key    type self      retain local pointer to value         return value                       faster to return than dict lookup   Note self key  is on the left-hand side of assignment  so there s no recursion here   and say you have some data  data      new jersey    mercer county    plumbers    3            new jersey    mercer county    programmers    81            new jersey    middlesex county    programmers    81            new jersey    middlesex county    salesmen    62            new york    queens county    plumbers    9            new york    queens county    salesmen    36   Here s our usage code  vividict   Vividict   for  state  county  occupation   number in data items        vividict state  county  occupation    number  And now   gt  gt  gt  import pprint  gt  gt  gt  pprint pprint vividict  width 40    new jersey     mercer county     plumbers   3                                     programmers   81                    middlesex county     programmers   81                                        salesmen   62      new york     queens county     plumbers   9                                   salesmen   36     Criticism A criticism of this type of container is that if the user misspells a key  our code could fail silently   gt  gt  gt  vividict  new york    queens counyt       And additionally now we d have a misspelled county in our data   gt  gt  gt  pprint pprint vividict  width 40    new jersey     mercer county     plumbers   3                                     programmers   81                    middlesex county     programmers   81                                        salesmen   62      new york     queens county     plumbers   9                                   salesmen   36                  queens counyt         Explanation  We re just providing another nested instance of our class Vividict whenever a key is accessed but missing   Returning the value assignment is useful because it avoids us additionally calling the getter on the dict  and unfortunately  we can t return it as it is being set   Note  these are the same semantics as the most upvoted answer but in half the lines of code - nosklo s implementation   class AutoVivification dict        quot  quot  quot Implementation of perl s autovivification feature  quot  quot  quot      def   getitem   self  item           try              return dict   getitem   self  item          except KeyError              value   self item    type self                return value   Demonstration of Usage Below is just an example of how this dict could be easily used to create a nested dict structure on the fly  This can quickly create a hierarchical tree structure as deeply as you might want to go  import pprint  class Vividict dict       def   missing   self  key           value   self key    type self            return value  d   Vividict    d  foo    bar   d  foo    baz   d  fizz    buzz   d  primary    secondary    tertiary    quaternary   pprint pprint d   Which outputs    fizz     buzz          foo     bar        baz          primary     secondary     tertiary     quaternary           And as the last line shows  it pretty prints beautifully and in order for manual inspection  But if you want to visually inspect your data  implementing   missing   to set a new instance of its class to the key and return it is a far better solution  Other alternatives  for contrast  dict setdefault Although the asker thinks this isn t clean  I find it preferable to the Vividict myself  d        or dict   for  state  county  occupation   number in data items        d setdefault state      setdefault county      occupation    number  and now   gt  gt  gt  pprint pprint d  width 40    new jersey     mercer county     plumbers   3                                     programmers   81                    middlesex county     programmers   81                                        salesmen   62      new york     queens county     plumbers   9                                   salesmen   36     A misspelling would fail noisily  and not clutter our data with bad information   gt  gt  gt  d  new york    queens counyt   Traceback  most recent call last     File  quot  lt stdin gt  quot   line 1  in  lt module gt  KeyError   queens counyt   Additionally  I think setdefault works great when used in loops and you don t know what you re going to get for keys  but repetitive usage becomes quite burdensome  and I don t think anyone would want to keep up the following  d   dict    d setdefault  foo       setdefault  bar       d setdefault  foo       setdefault  baz       d setdefault  fizz       setdefault  buzz       d setdefault  primary       setdefault  secondary       setdefault  tertiary       setdefault  quaternary        Another criticism is that setdefault requires a new instance whether it is used or not  However  Python  or at least CPython  is rather smart about handling unused and unreferenced new instances  for example  it reuses the location in memory   gt  gt  gt  id      id      id      523575344  523575344  523575344   An auto-vivified defaultdict This is a neat looking implementation  and usage in a script that you re not inspecting the data on would be as useful as implementing   missing    from collections import defaultdict  def vivdict        return defaultdict vivdict   But if you need to inspect your data  the results of an auto-vivified defaultdict populated with data in the same way looks like this   gt  gt  gt  d   vivdict    d  foo    bar    d  foo    baz    d  fizz    buzz    d  primary    secondary    tertiary    quaternary    import pprint    gt  gt  gt  pprint pprint d  defaultdict  lt function vivdict at 0x17B01870 gt     foo   defaultdict  lt function vivdict  at 0x17B01870 gt     baz   defaultdict  lt function vivdict at 0x17B01870 gt         bar    defaultdict  lt function vivdict at 0x17B01870 gt           primary   defaultdict  lt function  vivdict at 0x17B01870 gt     secondary   defaultdict  lt function vivdict at 0x17B01870 gt      tertiary   defaultdict  lt function vivdict at 0x17B01870 gt     quaternary   defaultdict   lt function vivdict at 0x17B01870 gt               fizz   defaultdict  lt function vivdict at  0x17B01870 gt     buzz   defaultdict  lt function vivdict at 0x17B01870 gt            This output is quite inelegant  and the results are quite unreadable  The solution typically given is to recursively convert back to a dict for manual inspection  This non-trivial solution is left as an exercise for the reader  Performance Finally  let s look at performance  I m subtracting the costs of instantiation   gt  gt  gt  import timeit  gt  gt  gt  min timeit repeat lambda     setdefault  foo         - min timeit repeat lambda       0 13612580299377441  gt  gt  gt  min timeit repeat lambda  vivdict    foo     - min timeit repeat lambda  vivdict     0 2936999797821045  gt  gt  gt  min timeit repeat lambda  Vividict    foo     - min timeit repeat lambda  Vividict     0 5354437828063965  gt  gt  gt  min timeit repeat lambda  AutoVivification    foo     - min timeit repeat lambda  AutoVivification     2 138362169265747  Based on performance  dict setdefault works the best  I d highly recommend it for production code  in cases where you care about execution speed  If you need this for interactive use  in an IPython notebook  perhaps  then performance doesn t really matter - in which case  I d go with Vividict for readability of the output  Compared to the AutoVivification object  which uses   getitem   instead of   missing    which was made for this purpose  it is far superior  Conclusion Implementing   missing   on a subclassed dict to set and return a new instance is slightly more difficult than alternatives but has the benefits of  easy instantiation easy data population easy data viewing  and because it is less complicated and more performant than modifying   getitem    it should be preferred to that method  Nevertheless  it has drawbacks   Bad lookups will fail silently  The bad lookup will remain in the dictionary   Thus I personally prefer setdefault to the other solutions  and have in every situation where I have needed this sort of behavior

User · Answer

This is a function that returns a nested dictionary of arbitrary depth   from collections import defaultdict def make dict        return defaultdict make dict    Use it like this   d defaultdict make dict  d  food    meat    beef  d  food    veggie    corn  d  food    sweets    ice cream  d  animal    pet    dog    collie  d  animal    pet    cat    tabby  d  animal    farm animal    chicken    Iterate through everything with something like this   def iter all d depth 1       for k v in d iteritems            print  -  depth k         if type v  is defaultdict              iter all v depth 1          else              print  -   depth 1  v  iter all d    This prints out   - food -- sweets --- ice cream -- meat --- beef -- veggie --- corn - animal -- pet --- dog ---- labrador --- cat ---- tabby -- farm animal --- chicken   You might eventually want to make it so that new items can not be added to the dict  It s easy to recursively convert all these defaultdicts to normal dicts   def dictify d       for k v in d iteritems            if isinstance v defaultdict               d k    dictify v      return dict d

User · Answer

If the number of nesting levels is small  I use collections defaultdict for this   from collections import defaultdict  def nested dict factory       return defaultdict int  def nested dict factory2       return defaultdict nested dict factory  db   defaultdict nested dict factory2   db  new jersey    mercer county    plumbers     3 db  new jersey    mercer county    programmers     81   Using defaultdict like this avoids a lot of messy setdefault    get    etc

User · Answer

defaultdict   is your friend   For a two dimensional dictionary you can do   d   defaultdict defaultdict  d 1  2    3   For more dimensions you can    d   defaultdict lambda  defaultdict defaultdict   d 1  2  3    4

User · Answer

For the following  copied from above  is there a way to implement the append function  I am trying to use a nested dictionary to store values as array  class Vividict dict       def   missing   self  key           value   self key    type self      retain local pointer to value     return value    My current implementation is as follows  totalGeneHash Vividict            for keys in GenHash      for second in GenHash keys           if keys in sampleHash              total val   GenHash keys  second                  totalGeneHash gene  keys  append total val  This is the error I get  AttributeError   Vividict  object has no attribute  append

User · Answer

Unless your dataset is going to stay pretty small  you might want to consider using a relational database  It will do exactly what you want  make it easy to add counts  selecting subsets of counts  and even aggregate counts by state  county  occupation  or any combination of these

User · Answer

As for  obnoxious try catch blocks    d      d setdefault  key      setdefault  inner key       inner inner key      value  print d   yields    key     inner key     inner inner key    value       You can use this to convert from your flat dictionary format to structured format   fd      new jersey    mercer county    plumbers    3     new jersey    mercer county    programmers    81     new jersey    middlesex county    programmers    81     new jersey    middlesex county    salesmen    62     new york    queens county    plumbers    9     new york    queens county    salesmen    36   for  k1 k2 k3   v in fd iteritems        d setdefault k1      setdefault k2      k3    v

User · Answer

I used to use this function  its safe  quick  easily maintainable   def deep get dictionary  keys  default None       return reduce lambda d  key  d get key  default  if isinstance d  dict  else default  keys split       dictionary    Example     gt  gt  gt  from functools import reduce  gt  gt  gt  def deep get dictionary  keys  default None           return reduce lambda d  key  d get key  default  if isinstance d  dict  else default  keys split       dictionary       gt  gt  gt  person     person    name    first   John      gt  gt  gt  print  deep get person   person name first    John  gt  gt  gt  print  deep get person   person name lastname    None  gt  gt  gt  print  deep get person   person name lastname   default  No lastname    No lastname  gt  gt  gt

User · Answer

You can use Addict  https   github com mewwts addict   gt  gt  gt  from addict import Dict  gt  gt  gt  my new shiny dict   Dict    gt  gt  gt  my new shiny dict a b c d e   2  gt  gt  gt  my new shiny dict   a     b     c     d     e   2

User · Answer

class JobDb object       def   init   self           self data              self all   set           self free              self index1              self index2              self index3           def  indices self  key1 key2 key3            indices   self all copy           wild   False         for index key in   self index1 key1   self index2 key2                                                 self index3 key3                if key is not None                  indices  amp   index setdefault key set                else                  wild   True         return indices  wild      def   getitem   self key           indices  wild   self  indices key          if wild              return dict self data i  for i in indices          else              values    self data i  -1  for i in indices              if values                  return values 0       def   setitem   self key value           indices  wild   self  indices key          if indices              for i in indices                  self data i    key value         elif wild              raise KeyError k          else              if self free                  index   self free pop 0                  self data index    key value             else                  index   len self data                  self data append  key value                   self all add index              self index1 setdefault key 0  set    add index              self index2 setdefault key 1  set    add index              self index3 setdefault key 2  set    add index       def   delitem   self key           indices wild   self  indices key          if not indices              raise KeyError         self index1 key 0   -  indices         self index2 key 1   -  indices         self index3 key 2   -  indices         self all -  indices         for i in indices              self data i    None         self free extend indices       def   len   self           return len self all       def   iter   self           for key value in self data              yield key   Example    gt  gt  gt  db   JobDb    gt  gt  gt  db  new jersey    mercer county    plumbers     3  gt  gt  gt  db  new jersey    mercer county    programmers     81  gt  gt  gt  db  new jersey    middlesex county    programmers     81  gt  gt  gt  db  new jersey    middlesex county    salesmen     62  gt  gt  gt  db  new york    queens county    plumbers     9  gt  gt  gt  db  new york    queens county    salesmen     36   gt  gt  gt  db  new york   None  None     new york    queens county    plumbers    9     new york    queens county    salesmen    36    gt  gt  gt  db None  None   plumbers      new jersey    mercer county    plumbers    3     new york    queens county    plumbers    9    gt  gt  gt  db  new jersey    mercer county   None     new jersey    mercer county    plumbers    3     new jersey    mercer county    programmers    81    gt  gt  gt  db  new jersey    middlesex county    programmers   81   gt  gt  gt    Edit  Now returning dictionaries when querying with wild cards  None   and single values otherwise

User · Answer

I like the idea of wrapping this in a class and implementing   getitem   and   setitem   such that they implemented a simple query language    gt  gt  gt  d  new jersey mercer county plumbers     3  gt  gt  gt  d  new jersey mercer county programmers     81  gt  gt  gt  d  new jersey mercer county programmers   81  gt  gt  gt  d  new jersey mercer country    lt view which implicitly adds  new jersey mercer county  to queries mutations gt    If you wanted to get fancy you could also implement something like    gt  gt  gt  d      programmers    lt view which would contain  programmers  entries gt    but mostly I think such a thing would be really fun to implement  D

[python] What is the best way to implement nested dictionaries?

Examples related to python

Examples related to data-structures

Examples related to dictionary

Examples related to mapping

Examples related to autovivification