Trying to merge 2 dataframes but get ValueError

Question

These are my two dataframes saved in two variables:

> print(df.head())
>
          club_name  tr_jan  tr_dec  year
    0  ADO Den Haag    1368    1422  2010
    1  ADO Den Haag    1455    1477  2011
    2  ADO Den Haag    1461    1443  2012
    3  ADO Den Haag    1437    1383  2013
    4  ADO Den Haag    1386    1422  2014
> print(rankingdf.head())
>
           club_name  ranking  year
    0    ADO Den Haag    12    2010
    1    ADO Den Haag    13    2011
    2    ADO Den Haag    11    2012
    3    ADO Den Haag    14    2013
    4    ADO Den Haag    17    2014

I'm trying to merge these two using this code:

new_df = df.merge(ranking_df, on=['club_name', 'year'], how='left')

The how='left' is added because I have less datapoints in my ranking_df than in my standard df.

The expected behaviour is as such:

> print(new_df.head()) 
> 

      club_name  tr_jan  tr_dec  year    ranking
0  ADO Den Haag    1368    1422  2010    12
1  ADO Den Haag    1455    1477  2011    13
2  ADO Den Haag    1461    1443  2012    11
3  ADO Den Haag    1437    1383  2013    14
4  ADO Den Haag    1386    1422  2014    17

But I get this error:

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

But I do not wish to use concat since I want to merge the trees not just add them on.

Another behaviour that's weird in my mind is that my code works if I save the first df to .csv and then load that .csv into a dataframe.

The code for that:

df = pd.DataFrame(data_points, columns=['club_name', 'tr_jan', 'tr_dec', 'year'])
df.to_csv('preliminary.csv')

df = pd.read_csv('preliminary.csv', index_col=0)

ranking_df = pd.DataFrame(rankings, columns=['club_name', 'ranking', 'year'])

new_df = df.merge(ranking_df, on=['club_name', 'year'], how='left')

I think that it has to do with the index_col=0 parameter. But I have no idea to fix it without having to save it, it doesn't matter much but is kind of an annoyance that I have to do that.

User · Answer

Arnon Rotem-Gal-Oz answer is right for the most part  But I would like to point out the difference between df  year   df  year   astype int  and df year astype int   df year astype int  returns a view of the dataframe and doesn t not explicitly change the type  atleast in pandas 0 24 2  df  year   df  year   astype int  explicitly change the type because it s an assignment  I would argue that this is the safest way to permanently change the dtype of a column    Example   df   pd DataFrame   Weed     green crack    northern lights    girl scout                              cookies     Qty   10 15 3    df dtypes     Weed    object    Qty      int64   df  Qty   astype str   df dtypes     Weed    object    Qty      int64   Even setting the inplace arg to True doesn t help at times  I don t know why this  happens though  In most cases inplace True equals an explicit assignment   df  Qty   astype str  inplace   True   df dtypes     Weed    object    Qty      int64   Now the assignment   df  Qty     df  Qty   astype str  df dtypes     Weed    object    Qty     object

User · Answer

In one of your dataframes the year is a string and the other it is an int64 you can convert it first and then join   e g  df  year   df  year   astype int  or as RafaelC suggested df year astype int   Edit  Also note the comment by Anderson Zhu  Just in case you have None or missing values in one of your dataframes  you need to use Int64 instead of int  See the reference here

User · Answer

I found that my dfs both had the same type column  str  but switching from join to merge solved the issue

User · Answer

Additional  when you save df to  csv format  the datetime  year in this specific case  is saved as object  so you need to convert it into integer  year in this specific case  when you do the merge  That is why when you upload both df from csv files  you can do the merge easily  while above error will show up if one df is uploaded from csv files and the other is from an existing df  This is somewhat annoying  but have an easy solution if kept in mind

User · Answer

At first check the type of columns which you want to merge  You will see one of them is string where other one is int  Then convert it to int as following code  df  quot something quot     df  quot something quot   astype int   merged   df merge df1  on  quot something quot

User · Answer

It happens when common column in both table are of different data type.

Example: In table1, you have date as string whereas in table2 you have date as datetime. so before merging,we need to change date to common data type.

User · Answer

this simple solution works for me      final   pd concat  df  rankingdf   axis 1  sort False    but you may need to drop some duplicate column first

[python] Trying to merge 2 dataframes but get ValueError

The answer is

Examples related to python

Examples related to pandas

Examples related to dataframe

Tags