python pandas Remove duplicates by columns A keeping the row with the highest value in column B

Question

I have a dataframe with repeat values in column A   I want to drop duplicates  keeping the row with the highest value in column B   So this   A B 1 10 1 20 2 30 2 40 3 10   Should turn into this   A B 1 20 2 40 3 10   Wes has added some nice functionality to drop duplicates  http   wesmckinney com blog  p 340   But AFAICT  it s designed for exact duplicates  so there s no mention of criteria for selecting which rows get kept   I m guessing there s probably an easy way to do this---maybe as easy as sorting the dataframe before dropping duplicates---but I don t know groupby s internal logic well enough to figure it out   Any suggestions

User · Answer

I think in your case you don t really need a groupby  I would sort by descending order your B column  then drop duplicates at column A and if you want you can also have a new nice and clean index like that   df sort values  B   ascending False  drop duplicates  A   sort index   reset index drop True

User · Answer

When already given posts answer the question  I made a small change by adding the column name on which the max   function is applied for better code readability   df groupby  A   as index False   B   max

User · Answer

I would sort the dataframe first with Column B descending  then drop duplicates for Column A and keep first  df   df sort values by  B   ascending False  df   df drop duplicates subset  A   keep  first     without any groupby

User · Answer

this also works   a pd DataFrame   A  a groupby  A    B   max   index  B  a groupby  A           B   max   values

User · Answer

The top answer is doing too much work and looks to be very slow for larger data sets  apply is slow and should be avoided if possible  ix is deprecated and should be avoided as well   df sort values  B   ascending False  drop duplicates  A   sort index       A   B 1  1  20 3  2  40 4  3  10   Or simply group by all the other columns and take the max of the column you need  df groupby  A   as index False  max

User · Answer

Try this   df groupby   A    max

User · Answer

This takes the last  Not the maximum though   In  10   df drop duplicates subset  A   keep  last   Out 10       A   B 1  1  20 3  2  40 4  3  10   You can do also something like   In  12   df groupby  A   group keys False  apply lambda x  x loc x B idxmax     Out 12       A   B A        1  1  20 2  2  40 3  3  10

User · Answer

You can try this as well  df drop duplicates subset  A   keep  last     I referred this from https   pandas pydata org pandas-docs stable generated pandas DataFrame drop duplicates html

User · Answer

Easiest way to do this     First you need to sort this DF as Column A as ascending and column B as descending    Then you can drop the duplicate values in A column    Optional - you can reset the index and get the nice data frame again   I m going to show you all in one step    d     A    1 1 2 3 1 2 3 1    B    30  40 50 42 38 30 25 32   df   pd DataFrame data d  df      A   B 0   1   30 1   1   40 2   2   50 3   3   42 4   1   38 5   2   30 6   3   25 7   1   32   df   df sort values   A   B    ascending   True False   drop duplicates   A    reset index drop True   df      A   B 0   1   40 1   2   50 2   3   42

User · Answer

Here s a variation I had to solve that s worth sharing  for each unique string in columnA I wanted to find the most common associated string in columnB   df groupby  columnA   agg   columnB   lambda x  x mode   any     reset index    The  any   picks one if there s a tie for the mode   Note that using  any   on a Series of ints returns a boolean rather than picking one of them    For the original question  the corresponding approach simplifies to  df groupby  columnA   columnB agg  max   reset index

User · Answer

I am not going to give you the whole answer  I don t think you re looking for the parsing and writing to file part anyway   but a pivotal hint should suffice  use python s set   function  and then sorted   or  sort   coupled with  reverse      gt  gt  gt  a sorted set  10 60 30 10 50 20 60 50 60 10 30     gt  gt  gt  a  10  20  30  50  60   gt  gt  gt  a reverse    gt  gt  gt  a  60  50  30  20  10

User · Answer

Simplest solution   To drop duplicates based on one column   df   df drop duplicates  column name   keep  last     To drop duplicates based on multiple columns   df   df drop duplicates   col name1   col name2   col name3    keep  last

[python] python pandas: Remove duplicates by columns A, keeping the row with the highest value in column B

Examples related to python

Examples related to duplicates

Examples related to pandas