This answer has been resolved for a while and all the available options are already out there. However in this answer I'll attempt to shed a bit more light on these options to help you understand when to use what.
This post will go through the following topics:
merge
, join
, concat
There are a few options, some simpler than others depending on the use case.
DataFrame.merge
withleft_index
andright_index
(orleft_on
andright_on
using named indexes)DataFrame.join
(joins on index)pd.concat
(joins on index)
PROS | CONS | |
---|---|---|
merge |
• supports inner/left/right/full |
• can only join two frames at a time |
join |
• supports inner/left (default)/right/full |
• only supports index-index joins |
concat |
• specializes in joining multiple DataFrames at a time |
• only supports inner/full (default) joins |
Typically, an inner join on index would look like this:
left.merge(right, left_index=True, right_index=True)
Other types of joins (left, right, outer) follow similar syntax (and can be controlled using how=...
).
Notable Alternatives
DataFrame.join
defaults to a left outer join on the index.
left.join(right, how='inner',)
If you happen to get ValueError: columns overlap but no suffix specified
, you will need to specify lsuffix
and rsuffix=
arguments to resolve this. Since the column names are same, a differentiating suffix is required.
pd.concat
joins on the index and can join two or more DataFrames at once. It does a full outer join by default.
pd.concat([left, right], axis=1, sort=False)
For more information on concat
, see this post.
To perform an inner join using index of left, column of right, you will use DataFrame.merge
a combination of left_index=True
and right_on=...
.
left.merge(right, left_index=True, right_on='key')
Other joins follow a similar structure. Note that only merge
can perform index to column joins. You can join on multiple levels/columns, provided the number of index levels on the left equals the number of columns on the right.
join
and concat
are not capable of mixed merges. You will need to set the index as a pre-step using DataFrame.set_index
.
This post is an abridged version of my work in Pandas Merging 101. Please follow this link for more examples and other topics on merging.