Pandas merge two dataframes with different columns

Question

I m surely missing something simple here  Trying to merge two dataframes in pandas that have mostly the same column names  but the right dataframe has some columns that the left doesn t have  and vice versa     gt df may    id  quantity  attr 1  attr 2 0  1        20       0       1 1  2        23       1       1 2  3        19       1       1 3  4        19       0       0   gt df jun    id  quantity  attr 1  attr 3 0  5         8       1       0 1  6        13       0       1 2  7        20       1       1 3  8        25       1       1   I ve tried joining with an outer join   mayjundf   pd DataFrame merge df may  df jun  how  outer     But that yields   Left data columns not unique  Index         I ve also specified a single column to join on  on    id   e g    but that duplicates all columns except  id  like attr 1 x  attr 1 y  which is not ideal  I ve also passed the entire list of columns  there are many  to  on    mayjundf   pd DataFrame merge df may  df jun  how  outer   on list df may columns values     Which yields   ValueError  Buffer has wrong number of dimensions  expected 1  got 2    What am I missing  I d like to get a df with all rows appended  and attr 1  attr 2  attr 3 populated where possible  NaN where they don t show up  This seems like a pretty typical workflow for data munging  but I m stuck   Thanks in advance

User · Accepted Answer

I think in this case concat is what you want   In  12    pd concat  df df1   axis 0  ignore index True  Out 12      attr 1  attr 2  attr 3  id  quantity 0       0       1     NaN   1        20 1       1       1     NaN   2        23 2       1       1     NaN   3        19 3       0       0     NaN   4        19 4       1     NaN       0   5         8 5       0     NaN       1   6        13 6       1     NaN       1   7        20 7       1     NaN       1   8        25   by passing axis 0 here you are stacking the df s on top of each other which I believe is what you want then producing NaN value where they are absent from their respective dfs

User · Answer

I had this problem today using any of concat  append or merge  and I got around it by adding a helper column sequentially numbered and then doing an outer join  helper 1 for i in df1 index      df1 loc i  helper   helper     helper helper 1 for i in df2 index      df2 loc i  helper   helper     helper helper 1 df1 merge df2 on  helper  how  outer

[python] Pandas merge two dataframes with different columns

Examples related to python

Examples related to pandas

Examples related to dataframe

Examples related to data-munging