For me , df1.subtract(df2) was inconsistent. Worked correctly on one dataframe but not on the other . That was because of duplicates . df1.exceptAll(df2) returns a new dataframe with the records from df1 that do not exist in df2 , including any duplicates.