This post will go through the following topics:
merge
, join
, concat
There are a few options, some simpler than others depending on the use case.
DataFrame.merge
withleft_index
andright_index
(orleft_on
andright_on
using names indexes)
- supports inner/left/right/full
- can only join two at a time
- supports column-column, index-column, index-index joins
DataFrame.join
(join on index)
- supports inner/left (default)/right/full
- can join multiple DataFrames at a time
- supports index-index joins
pd.concat
(joins on index)
- supports inner/full (default)
- can join multiple DataFrames at a time
- supports index-index joins
Setup & Basics
import pandas as pd
import numpy as np
np.random.seed([3, 14])
left = pd.DataFrame(data={'value': np.random.randn(4)},
index=['A', 'B', 'C', 'D'])
right = pd.DataFrame(data={'value': np.random.randn(4)},
index=['B', 'D', 'E', 'F'])
left.index.name = right.index.name = 'idxkey'
left
value
idxkey
A -0.602923
B -0.402655
C 0.302329
D -0.524349
right
value
idxkey
B 0.543843
D 0.013135
E -0.326498
F 1.385076
Typically, an inner join on index would look like this:
left.merge(right, left_index=True, right_index=True)
value_x value_y
idxkey
B -0.402655 0.543843
D -0.524349 0.013135
Other joins follow similar syntax.
Notable Alternatives
DataFrame.join
defaults to joins on the index. DataFrame.join
does a LEFT OUTER JOIN by default, so how='inner'
is necessary here.
left.join(right, how='inner', lsuffix='_x', rsuffix='_y')
value_x value_y
idxkey
B -0.402655 0.543843
D -0.524349 0.013135
Note that I needed to specify the lsuffix
and rsuffix
arguments since join
would otherwise error out:
left.join(right)
ValueError: columns overlap but no suffix specified: Index(['value'], dtype='object')
Since the column names are the same. This would not be a problem if they were differently named.
left.rename(columns={'value':'leftvalue'}).join(right, how='inner')
leftvalue value
idxkey
B -0.402655 0.543843
D -0.524349 0.013135
pd.concat
joins on the index and can join two or more DataFrames at once. It does a full outer join by default, so how='inner'
is required here..
pd.concat([left, right], axis=1, sort=False, join='inner')
value value
idxkey
B -0.402655 0.543843
D -0.524349 0.013135
For more information on concat
, see this post.
To perform an inner join using index of left, column of right, you will use DataFrame.merge
a combination of left_index=True
and right_on=...
.
right2 = right.reset_index().rename({'idxkey' : 'colkey'}, axis=1)
right2
colkey value
0 B 0.543843
1 D 0.013135
2 E -0.326498
3 F 1.385076
left.merge(right2, left_index=True, right_on='colkey')
value_x colkey value_y
0 -0.402655 B 0.543843
1 -0.524349 D 0.013135
Other joins follow a similar structure. Note that only merge
can perform index to column joins. You can join on multiple columns, provided the number of index levels on the left equals the number of columns on the right.
join
and concat
are not capable of mixed merges. You will need to set the index as a pre-step using DataFrame.set_index
.
If your index is named, then from pandas >= 0.23, DataFrame.merge
allows you to specify the index name to on
(or left_on
and right_on
as necessary).
left.merge(right, on='idxkey')
value_x value_y
idxkey
B -0.402655 0.543843
D -0.524349 0.013135
For the previous example of merging with the index of left, column of right, you can use left_on
with the index name of left:
left.merge(right2, left_on='idxkey', right_on='colkey')
value_x colkey value_y
0 -0.402655 B 0.543843
1 -0.524349 D 0.013135
Jump to other topics in Pandas Merging 101 to continue learning:
* you are here