Pandas Merge - How to avoid duplicating columns

Question

I am attempting a merge between two data frames   Each data frame has two index levels  date  cusip    In the columns  some columns match between the two  currency  adj date  for example   What is the best way to merge these by index  but to not take two copies of currency and adj date   Each data frame is 90 columns  so I am trying to avoid writing everything out by hand   df                  currency  adj date   data col1     date        cusip 2012-01-01  XSDP      USD      2012-01-03   0 45      df2                 currency  adj date   data col2     date        cusip 2012-01-01  XSDP      USD      2012-01-03   0 45       If I do   dfNew   merge df  df2  left index True  right index True  how  outer     I get   dfNew               currency x  adj date x   data col2     currency y adj date y date        cusip 2012-01-01  XSDP      USD      2012-01-03   0 45             USD         2012-01-03   Thank you

User · Accepted Answer

You can work out the columns that are only in one DataFrame and use this to select a subset of columns in the merge.

cols_to_use = df2.columns.difference(df.columns)

Then perform the merge (note this is an index object but it has a handy tolist() method).

dfNew = merge(df, df2[cols_to_use], left_index=True, right_index=True, how='outer')

This will avoid any columns clashing in the merge.

User · Answer

This is a bit of going around the problem  but I have written a function that basically deals with the extra columns  def merge fix cols df company df product uniqueID            df merged   pd merge df company                           df product                           how  left  left on uniqueID right on uniqueID          for col in df merged          if col endswith   x                df merged rename columns   lambda col col rstrip   x   inplace True          elif col endswith   y                to drop    col for col in df merged if col endswith   y                df merged drop to drop axis 1 inplace True          else              pass     return df merged  Seems to work well with my merges

User · Answer

Building on  rprog s answer  you can combine the various pieces of the suffix  amp  filter step into one line using a negative regex  dfNew   df merge df2  left index True  right index True               how  outer   suffixes        DROP    filter regex         DROP     Or using df join  dfNew   df join df2  lsuffix  quot DROP quot   filter regex  quot       DROP  quot    The regex here is keeping anything that does not end with the word  quot DROP quot   so just make sure to use a suffix that doesn t appear among the columns already

User · Answer

I m freshly new with Pandas but I wanted to achieve the same thing  automatically avoiding column names with  x or  y and removing duplicate data  I finally did it by using this answer and this one from Stackoverflow  sales csv       city state units     Mendocino CA 1     Denver CO 4     Austin TX 2   revenue csv       branch id city revenue state id     10 Austin 100 TX     20 Austin 83 TX     30 Austin 4 TX     47 Austin 200 TX     20 Denver 83 CO     30 Springfield 4 I   merge py import pandas  def drop y df         list comprehension of the cols that end with   y      to drop    x for x in df if x endswith   y        df drop to drop  axis 1  inplace True    sales   pandas read csv  data sales csv   delimiter      revenue   pandas read csv  data revenue csv   delimiter       result   pandas merge sales  revenue   how  inner   left on   state    right on   state id    suffixes        y    drop y result  result to csv  results output csv   index True  index label  id   sep          When executing the merge command I replace the  x suffix with an empty string and them I can remove columns ending with  y  output csv       id city state units branch id revenue state id     0 Denver CO 4 20 83 CO     1 Austin TX 2 10 100 TX     2 Austin TX 2 20 83 TX     3 Austin TX 2 30 4 TX     4 Austin TX 2 47 200 TX

User · Answer

I use the suffixes option in  merge     dfNew   df merge df2  left index True  right index True                   how  outer   suffixes        y    dfNew drop dfNew filter regex   y    columns tolist   axis 1  inplace True    Thanks  ijoseph

User · Answer

can t you just subset the columns in either df first   i for i in df columns if i not in df2 columns   dfNew   merge df    i for i in df columns if i not in df2 columns     df2  left index True  right index True  how  outer

[python] Pandas Merge - How to avoid duplicating columns

Examples related to python

Examples related to pandas