Construct pandas DataFrame from items in nested dictionary

Question

Suppose I have a nested dictionary  user dict  with structure    Level 1  UserId  Long Integer  Level 2  Category  String  Level 3  Assorted Attributes  floats  ints  etc      For example  an entry of this dictionary would be   user dict 12           Category 1     att 1   1                       att 2    whatever         Category 2     att 1   23                       att 2    another      each item in user dict has the same structure and user dict contains a large number of items which I want to feed to a pandas DataFrame  constructing the series from the attributes  In this case a hierarchical index would be useful for the purpose   Specifically  my question is whether there exists a way to to help the DataFrame constructor understand that the series should be built from the values of the  level 3  in the dictionary   If I try something like   df   pandas DataFrame users summary    The items in  level 1   the UserId s  are taken as columns  which is the opposite of what I want to achieve  have UserId s as index     I know I could construct the series after iterating over the dictionary entries  but if there is a more direct way this would be very useful  A similar question would be asking whether it is possible to construct a pandas DataFrame from json objects listed in a file

User · Accepted Answer

A pandas MultiIndex consists of a list of tuples. So the most natural approach would be to reshape your input dict so that its keys are tuples corresponding to the multi-index values you require. Then you can just construct your dataframe using pd.DataFrame.from_dict, using the option orient='index':

user_dict = {12: {'Category 1': {'att_1': 1, 'att_2': 'whatever'},
                  'Category 2': {'att_1': 23, 'att_2': 'another'}},
             15: {'Category 1': {'att_1': 10, 'att_2': 'foo'},
                  'Category 2': {'att_1': 30, 'att_2': 'bar'}}}

pd.DataFrame.from_dict({(i,j): user_dict[i][j] 
                           for i in user_dict.keys() 
                           for j in user_dict[i].keys()},
                       orient='index')


               att_1     att_2
12 Category 1      1  whatever
   Category 2     23   another
15 Category 1     10       foo
   Category 2     30       bar

An alternative approach would be to build your dataframe up by concatenating the component dataframes:

user_ids = []
frames = []

for user_id, d in user_dict.iteritems():
    user_ids.append(user_id)
    frames.append(pd.DataFrame.from_dict(d, orient='index'))

pd.concat(frames, keys=user_ids)

               att_1     att_2
12 Category 1      1  whatever
   Category 2     23   another
15 Category 1     10       foo
   Category 2     30       bar

User · Answer

In case someone wants to get the data frame in a  long format   leaf values have the same type  without multiindex  you can do this    pd DataFrame from records                 level1  level2  level3  leaf          for level1  level2 dict in user dict items           for level2  level3 dict in level2 dict items           for level3  leaf in level3 dict items              columns   UserId    Category    Attribute    value          UserId    Category Attribute     value 0       12  Category 1     att 1         1 1       12  Category 1     att 2  whatever 2       12  Category 2     att 1        23 3       12  Category 2     att 2   another 4       15  Category 1     att 1        10 5       15  Category 1     att 2       foo 6       15  Category 2     att 1        30 7       15  Category 2     att 2       bar    I know the original question probably wants  I   to have Levels 1 and 2 as multiindex and Level 3 as columns and  II   asks about other ways than iteration over values in the dict  But I hope this answer is still relevant and useful  I    to people like me who have tried to find a way to get the nested dict into this shape and google only returns this question and  II    because other answers involve some iteration as well and I find this approach flexible and easy to read  not sure about performance  though

User · Answer

So I used to use a for loop for iterating through the dictionary as well  but one thing I ve found that works much faster is to convert to a panel and then to a dataframe   Say you have a dictionary d  import pandas as pd d   RAY Index    datetime date 2014  11  3     PX LAST   1199 46   PX OPEN   1200 14   datetime date 2014  11  4     PX LAST   1195 323   PX OPEN   1197 69   datetime date 2014  11  5     PX LAST   1200 936   PX OPEN   1195 32   datetime date 2014  11  6     PX LAST   1206 061   PX OPEN   1200 62     SPX Index    datetime date 2014  11  3     PX LAST   2017 81   PX OPEN   2018 21   datetime date 2014  11  4     PX LAST   2012 1   PX OPEN   2015 81   datetime date 2014  11  5     PX LAST   2023 57   PX OPEN   2015 29   datetime date 2014  11  6     PX LAST   2031 21   PX OPEN   2023 33      The command  pd Panel d   lt class  pandas core panel Panel  gt  Dimensions  2  items  x 2  major axis  x 4  minor axis  Items axis  RAY Index to SPX Index Major axis axis  PX LAST to PX OPEN Minor axis axis  2014-11-03 to 2014-11-06   where pd Panel d  item  yields a dataframe  pd Panel d   SPX Index   2014-11-03  2014-11-04  2014-11-05 2014-11-06 PX LAST 2017 81 2012 10 2023 57 2031 21 PX OPEN 2018 21 2015 81 2015 29 2023 33   You can then hit the command to frame   to turn it into a dataframe  I use reset index as well to turn the major and minor axis into columns rather than have them as indices   pd Panel d  to frame   reset index   major   minor      RAY Index    SPX Index PX LAST 2014-11-03  1199 460    2017 81 PX LAST 2014-11-04  1195 323    2012 10 PX LAST 2014-11-05  1200 936    2023 57 PX LAST 2014-11-06  1206 061    2031 21 PX OPEN 2014-11-03  1200 140    2018 21 PX OPEN 2014-11-04  1197 690    2015 81 PX OPEN 2014-11-05  1195 320    2015 29 PX OPEN 2014-11-06  1200 620    2023 33   Finally  if you don t like the way the frame looks you can use the transpose function of panel to change the appearance before calling to frame   see documentation here  http   pandas pydata org pandas-docs dev generated pandas Panel transpose html  Just as an example  pd Panel d  transpose 2 0 1  to frame   reset index   major        minor  2014-11-03  2014-11-04  2014-11-05  2014-11-06 RAY Index   PX LAST 1199 46    1195 323     1200 936    1206 061 RAY Index   PX OPEN 1200 14    1197 690     1195 320    1200 620 SPX Index   PX LAST 2017 81    2012 100     2023 570    2031 210 SPX Index   PX OPEN 2018 21    2015 810     2015 290    2023 330   Hope this helps

User · Answer

Building on verified answer  for me this worked best  ab   pd concat  k  pd DataFrame v  T for k  v in data items     axis 0  ab T

User · Answer

pd concat accepts a dictionary  With this in mind  it is possible to improve upon the currently accepted answer in terms of simplicity and performance by use a dictionary comprehension to build a dictionary mapping keys to sub-frames   pd concat  k  pd DataFrame v  T for k  v in user dict items     axis 0    Or   pd concat           k  pd DataFrame from dict v   index   for k  v in user dict items               axis 0                    att 1     att 2 12 Category 1     1  whatever    Category 2    23   another 15 Category 1    10       foo    Category 2    30       bar

[python] Construct pandas DataFrame from items in nested dictionary

Examples related to python

Examples related to pandas

Examples related to dataframe

Examples related to multi-index