Creating a pandas DataFrame from columns of other DataFrames with similar indexes

Question

I have 2 DataFrames df1 and df2 with the same column names   a   b   c   and indexed by dates  The date index can have similar values  I would like to create a DataFrame df3 with only the data from columns   c   renamed respectively  df1  and  df2  and with the correct date index  My problem is that I cannot get how to merge the index properly   df1   pd DataFrame np random randn 5 3   index pd date range  01 02 2014  periods 5 freq  D    columns   a   b   c     df2   pd DataFrame np random randn 8 3   index pd date range  01 01 2014  periods 8 freq  D    columns   a   b   c     df1                  a        b            c 2014-01-02   0 580550    0 480814    1 135899 2014-01-03  -1 961033    0 546013    1 093204 2014-01-04   2 063441   -0 627297    2 035373 2014-01-05   0 319570    0 058588    0 350060 2014-01-06   1 318068   -0 802209   -0 939962  df2                  a        b            c 2014-01-01   0 772482    0 899337    0 808630 2014-01-02   0 518431   -1 582113    0 323425 2014-01-03   0 112109    1 056705   -1 355067 2014-01-04   0 767257   -2 311014    0 340701 2014-01-05   0 794281   -1 954858    0 200922 2014-01-06   0 156088    0 718658   -1 030077 2014-01-07   1 621059    0 106656   -0 472080 2014-01-08  -2 061138   -2 023157    0 257151   The df3 DataFrame should have the following form     df3                  df1        df2 2014-01-01   NaN        0 808630 2014-01-02   1 135899   0 323425 2014-01-03   1 093204   -1 355067 2014-01-04   2 035373   0 340701 2014-01-05   0 350060   0 200922 2014-01-06   -0 939962  -1 030077 2014-01-07   NaN        -0 472080 2014-01-08   NaN        0 257151   But with NaN in the df1 column as the date index of df2 is wider   In this example  I would get NaN for the ollowing dates   2014-01-01  2014-01-07 and 2014-01-08   Thanks for your help

User · Answer

Well  I m not sure that merge would be the way to go  Personally I would build a new data frame by creating an index of the dates and then constructing the columns using list comprehensions  Possibly not the most pythonic way  but it seems to work for me   import pandas as pd import numpy as np  df1   pd DataFrame np random randn 5 3   index pd date range  01 02 2014  periods 5 freq  D    columns   a   b   c     df2   pd DataFrame np random randn 8 3   index pd date range  01 01 2014  periods 8 freq  D    columns   a   b   c        Create an index list from the set of dates in both data frames Index   list set list df1 index    list df2 index    Index sort    df3   pd DataFrame   df1    df1 loc Date   c   if Date in df1 index else np nan for Date in Index                     df2    df2 loc Date   c   if Date in df2 index else np nan for Date in Index                      index   Index   df3

User · Answer

What you ask for is the join operation  With the how argument  you can define how unique indices are handled  Here  some article  which looks helpful concerning this point  In the example below  I left out cosmetics  like renaming columns  for simplicity   Code  import numpy as np import pandas as pd df1   pd DataFrame np random randn 5 3   index pd date range  01 02 2014  periods 5 freq  D    columns   a   b   c     df2   pd DataFrame np random randn 8 3   index pd date range  01 01 2014  periods 8 freq  D    columns   a   b   c      df3   df1 join df2  how  outer   lsuffix   df1   rsuffix   df2   print df3    Output                 a df1     b df1     c df1     a df2     b df2     c df2 2014-01-01       NaN       NaN       NaN  0 109898  1 107033 -1 045376 2014-01-02  0 573754  0 169476 -0 580504 -0 664921 -0 364891 -1 215334 2014-01-03 -0 766361 -0 739894 -1 096252  0 962381 -0 860382 -0 703269 2014-01-04  0 083959 -0 123795 -1 405974  1 825832 -0 580343  0 923202 2014-01-05  1 019080 -0 086650  0 126950 -0 021402 -1 686640  0 870779 2014-01-06 -1 036227 -1 103963 -0 821523 -0 943848 -0 905348  0 430739 2014-01-07       NaN       NaN       NaN  0 312005  0 586585  1 531492 2014-01-08       NaN       NaN       NaN -0 077951 -1 189960  0 995123

User · Answer

You can use concat   In  11   pd concat  df1  c    df2  c     axis 1  keys   df1    df2    Out 11                     df1       df2 2014-01-01       NaN -0 978535 2014-01-02 -0 106510 -0 519239 2014-01-03 -0 846100 -0 313153 2014-01-04 -0 014253 -1 040702 2014-01-05  0 315156 -0 329967 2014-01-06 -0 510577 -0 940901 2014-01-07       NaN -0 024608 2014-01-08       NaN -1 791899   8 rows x 2 columns    The axis argument determines the way the DataFrames are stacked   df1   pd DataFrame  1  2  3   df2   pd DataFrame   a    b    c     pd concat  df1  df2   axis 0     0 0  1 1  2 2  3 0  a 1  b 2  c  pd concat  df1  df2   axis 1      0  0 0  1  a 1  2  b 2  3  c

[python] Creating a pandas DataFrame from columns of other DataFrames with similar indexes

Examples related to python

Examples related to pandas

Examples related to dataframe