Find row where values for column is maximal in a pandas DataFrame

Question

How can I find the row for which the value of a specific column is maximal   df max   will give me the maximal value for each column  I don t know how to get the corresponding row

User · Answer

The direct   argmax    solution does not work for me    The previous example provided by  ely   gt  gt  gt  import pandas  gt  gt  gt  import numpy as np  gt  gt  gt  df   pandas DataFrame np random randn 5 3  columns   A   B   C     gt  gt  gt  df       A         B         C 0  1 232853 -1 979459 -0 573626 1  0 140767  0 394940  1 068890 2  0 742023  1 343977 -0 579745 3  2 125299 -0 649328 -0 211692 4 -0 187253  1 908618 -1 862934  gt  gt  gt  df  A   argmax   3  gt  gt  gt  df  B   argmax   4  gt  gt  gt  df  C   argmax   1   returns the following message    FutureWarning   argmax  is deprecated  use  idxmax  instead  The behavior of  argmax   will be corrected to return the positional maximum in the future  Use  series values argmax  to get the position of the maximum now    So that my solution is    df  A   values argmax

User · Answer

A more compact and readable solution using query   is like this  import pandas as pd  df   pandas DataFrame np random randn 5 3  columns   A   B   C    print df     find row with maximum A df query  A    A max      It also returns a DataFrame instead of Series  which would be handy for some use cases

User · Answer

If you want the entire row instead of just the id  you can use df nlargest and pass in how many  top  rows you want and you can also pass in for which column columns you want it for  df nlargest 2   A     will give you the rows corresponding to the top 2 values of A  use df nsmallest for min values

User · Answer

The idmax of the DataFrame returns the label index of the row with the maximum value and the behavior of argmax depends on version of pandas  right now it returns a warning   If you want to use the positional index  you can do the following   max row   df  A   values argmax     or  import numpy as np max row   np argmax df  A   values    Note that if you use np argmax df  A    behaves the same as df  A   argmax

User · Answer

You might also try idxmax   In  5   df   pandas DataFrame np random randn 10 3  columns   A   B   C     In  6   df Out 6              A         B         C 0  2 001289  0 482561  1 579985 1 -0 991646 -0 387835  1 320236 2  0 143826 -1 096889  1 486508 3 -0 193056 -0 499020  1 536540 4 -2 083647 -3 074591  0 175772 5 -0 186138 -1 949731  0 287432 6 -0 480790 -1 771560 -0 930234 7  0 227383 -0 278253  2 102004 8 -0 002592  1 434192 -1 624915 9  0 404911 -2 167599 -0 452900  In  7   df idxmax   Out 7    A    0 B    8 C    7   e g   In  8   df loc df  A   idxmax    Out 8    A    2 001289 B    0 482561 C    1 579985

User · Answer

df iloc df  columnX   argmax      argmax   would provide the index corresponding to the max value for the columnX  iloc can be used to get the row of the DataFrame df for this index

User · Answer

Use the pandas idxmax function  It s straightforward    gt  gt  gt  import pandas  gt  gt  gt  import numpy as np  gt  gt  gt  df   pandas DataFrame np random randn 5 3  columns   A   B   C     gt  gt  gt  df           A         B         C 0  1 232853 -1 979459 -0 573626 1  0 140767  0 394940  1 068890 2  0 742023  1 343977 -0 579745 3  2 125299 -0 649328 -0 211692 4 -0 187253  1 908618 -1 862934  gt  gt  gt  df  A   argmax   3  gt  gt  gt  df  B   argmax   4  gt  gt  gt  df  C   argmax   1    Alternatively you could also use numpy argmax  such as numpy argmax df  A    -- it provides the same thing  and appears at least as fast as idxmax in cursory observations  idxmax   returns indices labels  not integers    Example   if you have string values as your index labels  like rows  a  through  e   you might want to know that the max occurs in row 4  not row  d    if you want the integer position of that label within the Index you have to get it manually  which can be tricky now that duplicate row labels are allowed        HISTORICAL NOTES    idxmax   used to be called argmax   prior to 0 11 argmax was deprecated prior to 1 0 0 and removed entirely in 1 0 0 back as of Pandas 0 16  argmax used to exist and perform the same function  though appeared to run more slowly than idxmax     argmax function returned the integer position within the index of the row location of the maximum element  pandas moved to using row labels instead of integer indices  Positional integer indices used to be very common  more common than labels  especially in applications where duplicate row labels are common     For example  consider this toy DataFrame with a duplicate row label   In  19   dfrm Out 19              A         B         C a  0 143693  0 653810  0 586007 b  0 623582  0 312903  0 919076 c  0 165438  0 889809  0 000967 d  0 308245  0 787776  0 571195 e  0 870068  0 935626  0 606911 f  0 037602  0 855193  0 728495 g  0 605366  0 338105  0 696460 h  0 000000  0 090814  0 963927 i  0 688343  0 188468  0 352213 i  0 879000  0 105039  0 900260  In  20   dfrm  A   idxmax   Out 20    i   In  21   dfrm iloc dfrm  A   idxmax        ix instead of  iloc in older versions of pandas Out 21              A         B         C i  0 688343  0 188468  0 352213 i  0 879000  0 105039  0 900260   So here a naive use of idxmax is not sufficient  whereas the old form of argmax would correctly provide the positional location of the max row  in this case  position 9    This is exactly one of those nasty kinds of bug-prone behaviors in dynamically typed languages that makes this sort of thing so unfortunate  and worth beating a dead horse over  If you are writing systems code and your system suddenly gets used on some data sets that are not cleaned properly before being joined  it s very easy to end up with duplicate row labels  especially string labels like a CUSIP or SEDOL identifier for financial assets  You can t easily use the type system to help you out  and you may not be able to enforce uniqueness on the index without running into unexpectedly missing data    So you re left with hoping that your unit tests covered everything  they didn t  or more likely no one wrote any tests  -- otherwise  most likely  you re just left waiting to see if you happen to smack into this error at runtime  in which case you probably have to go drop many hours worth of work from the database you were outputting results to  bang your head against the wall in IPython trying to manually reproduce the problem  finally figuring out that it s because idxmax can only report the label of the max row  and then being disappointed that no standard function automatically gets the positions of the max row for you  writing a buggy implementation yourself  editing the code  and praying you don t run into the problem again

User · Answer

mx iloc 0  idxmax     This one line of code will give you how to find the maximum value from a row in dataframe  here mx is the dataframe and iloc 0  indicates the 0th index

User · Answer

Very simple  we have df as below and we want to print a row with max value in C  A  B  C x  1  4 y  2  10 z  5  9  In  df loc df  C      df  C   max        condition check  Out  A B C y 2 10

User · Answer

Both above answers would only return one index if there are multiple rows that take the maximum value  If you want all the rows  there does not seem to have a function  But it is not hard to do  Below is an example for Series  the same can be done for DataFrame   In  1   from pandas import Series  DataFrame  In  2   s Series  2 4 4 3  index   a   b   c   d     In  3   s idxmax   Out 3    b   In  4   s s  s max    Out 4    b    4 c    4 dtype  int64

[python] Find row where values for column is maximal in a pandas DataFrame

Examples related to python

Examples related to pandas

Examples related to argmax