Remove duplicates from dataframe based on two columns A B keeping row with max value in another column C

Question

I have a pandas dataframe which contains duplicates values according to two columns  A and B    A B C 1 2 1 1 2 4 2 7 1 3 4 0 3 4 8   I want to remove duplicates keeping the row with max value in column C  This would lead to    A B C 1 2 4 2 7 1 3 4 8   I cannot figure out how to do that  Should I use drop duplicates    something else

User · Accepted Answer

You can do it using group by   c maxes   df groupby   A    B    C transform max  df   df loc df C    c maxes    c maxes is a Series of the maximum values of C in each group but which is of the same length and with the same index as df  If you haven t used  transform then printing c maxes might be a good idea to see how it works    Another approach using drop duplicates would be   df sort  C   drop duplicates subset   A    B    take last True    Not sure which is more efficient but I guess the first approach as it doesn t involve sorting    EDIT  From pandas 0 18 up the second solution would be   df sort values  C   drop duplicates subset   A    B    keep  last     or  alternatively   df sort values  C   ascending False  drop duplicates subset   A    B      In any case  the groupby solution seems to be significantly more performing     timeit -n 10 df loc df groupby   A    B    C max    df C  10 loops  best of 3  25 7 ms per loop   timeit -n 10 df sort values  C   drop duplicates subset   A    B    keep  last   10 loops  best of 3  101 ms per loop

User · Answer

You can do it with drop duplicates as you wanted    initialisation d   pd DataFrame   A     1 1 2 3 3    B     2 2 7 4 4     C     1 4 1 0 8     d   d sort values  C   ascending False  d   d drop duplicates   A   B      If it s important to get the same order   d   d sort index

User · Answer

You can do this simply by using pandas drop duplicates function  df drop duplicates   A   B   keep   last

User · Answer

I think groupby should work   df groupby   A    B    max    C     If you need a dataframe back you can chain the reset index call   df groupby   A    B    max    C   reset index

[python] Remove duplicates from dataframe, based on two columns A,B, keeping row with max value in another column C

Examples related to python

Examples related to pandas

Examples related to dataframe

Examples related to duplicates