Get the row s which have the max value in groups using groupby

Question

How do I find all rows in a pandas data frame which have the max value for count column  after grouping by   Sp   Mt   columns  Example 1  the following dataFrame  which I group by   Sp   Mt       Sp   Mt Value   count 0  MM1  S1   a       3   1  MM1  S1   n       2 2  MM1  S3   cb      5   3  MM2  S3   mk      8   4  MM2  S4   bg      10   5  MM2  S4   dgd     1 6  MM4  S2   rd      2 7  MM4  S2   cb      2 8  MM4  S2   uyi     7    Expected output  get the result rows whose count is max between the groups  like  0  MM1  S1   a        3   2  MM1  S3   cb       5   3  MM2  S3   mk       8   4  MM2  S4   bg       10    8  MM4  S2   uyi      7    Example 2  this dataframe  which I group by   Sp   Mt       Sp   Mt   Value  count 4  MM2  S4   bg     10 5  MM2  S4   dgd    1 6  MM4  S2   rd     2 7  MM4  S2   cb     8 8  MM4  S2   uyi    8  For the above example  I want to get all the rows where count equals max  in each group e g   MM2  S4   bg     10 MM4  S2   cb     8 MM4  S2   uyi    8

User · Accepted Answer

In  1   df Out 1       Sp  Mt Value  count 0  MM1  S1     a      3 1  MM1  S1     n      2 2  MM1  S3    cb      5 3  MM2  S3    mk      8 4  MM2  S4    bg     10 5  MM2  S4   dgd      1 6  MM4  S2    rd      2 7  MM4  S2    cb      2 8  MM4  S2   uyi      7  In  2   df groupby   Mt    sort False   count   max   Out 2   Mt S1     3 S3     8 S4    10 S2     7 Name  count   To get the indices of the original DF you can do   In  3   idx   df groupby   Mt     count   transform max     df  count    In  4   df idx  Out 4       Sp  Mt Value  count 0  MM1  S1     a      3 3  MM2  S3    mk      8 4  MM2  S4    bg     10 8  MM4  S2   uyi      7   Note that if you have multiple max values per group  all will be returned   Update  On a hail mary chance that this is what the OP is requesting   In  5   df  count max     df groupby   Mt     count   transform max   In  6   df Out 6       Sp  Mt Value  count  count max 0  MM1  S1     a      3          3 1  MM1  S1     n      2          3 2  MM1  S3    cb      5          8 3  MM2  S3    mk      8          8 4  MM2  S4    bg     10         10 5  MM2  S4   dgd      1         10 6  MM4  S2    rd      2          7 7  MM4  S2    cb      2          7 8  MM4  S2   uyi      7          7

User · Answer

Use groupby and idxmax methods    transfer col date to datetime   df  date   pd to datetime df  date     get the index of max of column date  after groupyby ad id   idx df groupby by  ad id    date   idxmax    get the wanted data   df max df loc idx      Out 54    ad id  price       date 7     22      2 2018-06-11 6     23      2 2018-06-22 2     24      2 2018-06-30 3     28      5 2018-06-22

User · Answer

Realizing that  applying   nlargest  to groupby object works just as fine   Additional advantage - also can fetch top n values if required   In  85   import pandas as pd  In  86   df   pd DataFrame             sp      MM1    MM1    MM1    MM2    MM2    MM2    MM4    MM4   MM4              mt      S1    S1    S3    S3    S4    S4    S2    S2    S2              val      a    n    cb    mk    bg    dgb    rd    cb    uyi              count     3 2 5 8 10 1 2 2 7                  Apply nlargest 1  to find the max val df  and nlargest n  gives top n values for df  In  87   df groupby   sp    mt    apply lambda x  x nlargest 1   count    reset index drop True  Out 87      count  mt   sp  val 0      3  S1  MM1    a 1      5  S3  MM1   cb 2      8  S3  MM2   mk 3     10  S4  MM2   bg 4      7  S2  MM4  uyi

User · Answer

df   pd DataFrame    sp      MM1    MM1    MM1    MM2    MM2    MM2    MM4    MM4   MM4     mt      S1    S1    S3    S3    S4    S4    S2    S2    S2     val      a    n    cb    mk    bg    dgb    rd    cb    uyi     count     3 2 5 8 10 1 2 2 7      df groupby   sp    mt    apply lambda grp  grp nlargest 1   count

User · Answer

Easy solution would be to apply   idxmax   function to get indices of rows with max values   This would filter out all the rows with max value in the group   In  365   import pandas as pd  In  366   df   pd DataFrame    sp      MM1    MM1    MM1    MM2    MM2    MM2    MM4    MM4   MM4     mt      S1    S1    S3    S3    S4    S4    S2    S2    S2     val      a    n    cb    mk    bg    dgb    rd    cb    uyi     count     3 2 5 8 10 1 2 2 7      In  367   df                                                                                                        Out 367       count  mt   sp  val 0      3  S1  MM1    a 1      2  S1  MM1    n 2      5  S3  MM1   cb 3      8  S3  MM2   mk 4     10  S4  MM2   bg 5      1  S4  MM2  dgb 6      2  S2  MM4   rd 7      2  S2  MM4   cb 8      7  S2  MM4  uyi       Apply idxmax   and use  loc   on dataframe to filter the rows with max values  In  368   df loc df groupby   sp    mt     count   idxmax                                                           Out 368       count  mt   sp  val 0      3  S1  MM1    a 2      5  S3  MM1   cb 3      8  S3  MM2   mk 4     10  S4  MM2   bg 8      7  S2  MM4  uyi      Just to show what values are returned by  idxmax   above  In  369   df groupby   sp    mt     count   idxmax   values                                                         Out 369   array  0  2  3  4  8

User · Answer

For me  the easiest solution would be keep value when count is equal to the maximum  Therefore  the following one line command is enough     df df  count      df groupby   Mt     count   transform max

User · Answer

I ve been using this functional style for many group operations   df   pd DataFrame       Sp      MM1    MM1    MM1    MM2    MM2    MM2    MM4    MM4    MM4        Mt      S1    S1    S3    S3    S4    S4    S2    S2    S2        Val      a    n    cb    mk    bg    dgb    rd    cb    uyi        Count     3 2 5 8 10 1 2 2 7      df groupby  Mt       apply lambda group  group group Count    group Count max         reset index drop True       sp  mt  val  count 0  MM1  S1    a      3 1  MM4  S2  uyi      7 2  MM2  S3   mk      8 3  MM2  S4   bg     10    reset index drop True  gets you back to the original index by dropping the group-index

User · Answer

You may not need to do with group by   using sort values  drop duplicates  df sort values  count   drop duplicates   Sp   Mt   keep  last   Out 190        Sp  Mt Value  count 0  MM1  S1     a      3 2  MM1  S3    cb      5 8  MM4  S2   uyi      7 3  MM2  S3    mk      8 4  MM2  S4    bg     10   Also almost same logic by using tail  df sort values  count   groupby   Sp    Mt    tail 1  Out 52        Sp  Mt Value  count 0  MM1  S1     a      3 2  MM1  S3    cb      5 8  MM4  S2   uyi      7 3  MM2  S3    mk      8 4  MM2  S4    bg     10

User · Answer

Try using  nlargest  on the groupby object  The advantage of using nlargest is that it returns the index of the rows where  the nlargest item s   were fetched from   Note  we slice the second 1  element of our index since our index in this case consist of tuples eg  s1  0     df   pd DataFrame    sp      MM1    MM1    MM1    MM2    MM2    MM2    MM4    MM4   MM4     mt      S1    S1    S3    S3    S4    S4    S2    S2    S2     val      a    n    cb    mk    bg    dgb    rd    cb    uyi     count     3 2 5 8 10 1 2 2 7      d   df groupby  mt    count   nlargest 1    pass 1 since we want the max  df iloc  i 1  for i in d index        pass the index of d as list comprehension   enter image description here

User · Answer

Having tried the solution suggested by Zelazny on a relatively large DataFrame   400k rows  I found it to be very slow   Here is an alternative that I found to run orders of magnitude faster on my data set   df   pd DataFrame        sp      MM1    MM1    MM1    MM2    MM2    MM2    MM4    MM4    MM4         mt      S1    S1    S3    S3    S4    S4    S2    S2    S2         val      a    n    cb    mk    bg    dgb    rd    cb    uyi         count     3 2 5 8 10 1 2 2 7          df grouped   df groupby   sp    mt    agg   count   max     df grouped   df grouped reset index    df grouped   df grouped rename columns   count   count max     df   pd merge df  df grouped  how  left   on   sp    mt     df   df df  count      df  count max

User · Answer

You can sort the dataFrame by count and then remove duplicates  I think it s easier   df sort values  count   ascending False  drop duplicates   Sp   Mt

[python] Get the row(s) which have the max value in groups using groupby

Examples related to python

Examples related to pandas

Examples related to max

Examples related to pandas-groupby