[python] Merge two dataframes by index

This answer has been resolved for a while and all the available options are already out there. However in this answer I'll attempt to shed a bit more light on these options to help you understand when to use what.

This post will go through the following topics:

  • Merging with index under different conditions
    • options for index-based joins: merge, join, concat
    • merging on indexes
    • merging on index of one, column of other
  • effectively using named indexes to simplify merging syntax


Index-based joins

TL;DR

There are a few options, some simpler than others depending on the use case.

  1. DataFrame.merge with left_index and right_index (or left_on and right_on using named indexes)
  2. DataFrame.join (joins on index)
  3. pd.concat (joins on index)
PROS CONS
merge

• supports inner/left/right/full
• supports column-column, index-column, index-index joins

• can only join two frames at a time

join

• supports inner/left (default)/right/full
• can join multiple DataFrames at a time

• only supports index-index joins

concat

• specializes in joining multiple DataFrames at a time
• very fast (concatenation is linear time)

• only supports inner/full (default) joins
• only supports index-index joins


Index to index joins

Typically, an inner join on index would look like this:

left.merge(right, left_index=True, right_index=True)

Other types of joins (left, right, outer) follow similar syntax (and can be controlled using how=...).

Notable Alternatives

  1. DataFrame.join defaults to a left outer join on the index.

     left.join(right, how='inner',)
    

    If you happen to get ValueError: columns overlap but no suffix specified, you will need to specify lsuffix and rsuffix= arguments to resolve this. Since the column names are same, a differentiating suffix is required.

  2. pd.concat joins on the index and can join two or more DataFrames at once. It does a full outer join by default.

     pd.concat([left, right], axis=1, sort=False)
    

    For more information on concat, see this post.


Index to Column joins

To perform an inner join using index of left, column of right, you will use DataFrame.merge a combination of left_index=True and right_on=....

left.merge(right, left_index=True, right_on='key')

Other joins follow a similar structure. Note that only merge can perform index to column joins. You can join on multiple levels/columns, provided the number of index levels on the left equals the number of columns on the right.

join and concat are not capable of mixed merges. You will need to set the index as a pre-step using DataFrame.set_index.


This post is an abridged version of my work in Pandas Merging 101. Please follow this link for more examples and other topics on merging.

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe

Examples related to merge

Pandas Merging 101 Python: pandas merge multiple dataframes Git merge with force overwrite Merge two dataframes by index Visual Studio Code how to resolve merge conflicts with git? merge one local branch into another local branch Merging dataframes on index with pandas Git merge is not possible because I have unmerged files Git merge develop into feature branch outputs "Already up-to-date" while it's not How merge two objects array in angularjs?

Examples related to concat

Merge two dataframes by index Concatenate a list of pandas dataframes together How do I concatenate strings in Swift? Adding two Java 8 streams, or an extra element to a stream MySQL CONCAT returns NULL if any field contain NULL How to concat two ArrayLists? Concat a string to SELECT * MySql How to use GROUP_CONCAT in a CONCAT in MySQL Which is the preferred way to concatenate a string in Python? Prepend text to beginning of string