[python] Find row where values for column is maximal in a pandas DataFrame

How can I find the row for which the value of a specific column is maximal?

df.max() will give me the maximal value for each column, I don't know how to get the corresponding row.

This question is related to python pandas argmax

The answer is


df.iloc[df['columnX'].argmax()]

argmax() would provide the index corresponding to the max value for the columnX. iloc can be used to get the row of the DataFrame df for this index.


You might also try idxmax:

In [5]: df = pandas.DataFrame(np.random.randn(10,3),columns=['A','B','C'])

In [6]: df
Out[6]: 
          A         B         C
0  2.001289  0.482561  1.579985
1 -0.991646 -0.387835  1.320236
2  0.143826 -1.096889  1.486508
3 -0.193056 -0.499020  1.536540
4 -2.083647 -3.074591  0.175772
5 -0.186138 -1.949731  0.287432
6 -0.480790 -1.771560 -0.930234
7  0.227383 -0.278253  2.102004
8 -0.002592  1.434192 -1.624915
9  0.404911 -2.167599 -0.452900

In [7]: df.idxmax()
Out[7]: 
A    0
B    8
C    7

e.g.

In [8]: df.loc[df['A'].idxmax()]
Out[8]: 
A    2.001289
B    0.482561
C    1.579985

The direct ".argmax()" solution does not work for me.

The previous example provided by @ely

>>> import pandas
>>> import numpy as np
>>> df = pandas.DataFrame(np.random.randn(5,3),columns=['A','B','C'])
>>> df
      A         B         C
0  1.232853 -1.979459 -0.573626
1  0.140767  0.394940  1.068890
2  0.742023  1.343977 -0.579745
3  2.125299 -0.649328 -0.211692
4 -0.187253  1.908618 -1.862934
>>> df['A'].argmax()
3
>>> df['B'].argmax()
4
>>> df['C'].argmax()
1

returns the following message :

FutureWarning: 'argmax' is deprecated, use 'idxmax' instead. The behavior of 'argmax' 
will be corrected to return the positional maximum in the future.
Use 'series.values.argmax' to get the position of the maximum now.

So that my solution is :

df['A'].values.argmax()

Both above answers would only return one index if there are multiple rows that take the maximum value. If you want all the rows, there does not seem to have a function. But it is not hard to do. Below is an example for Series; the same can be done for DataFrame:

In [1]: from pandas import Series, DataFrame

In [2]: s=Series([2,4,4,3],index=['a','b','c','d'])

In [3]: s.idxmax()
Out[3]: 'b'

In [4]: s[s==s.max()]
Out[4]: 
b    4
c    4
dtype: int64

If you want the entire row instead of just the id, you can use df.nlargest and pass in how many 'top' rows you want and you can also pass in for which column/columns you want it for.

df.nlargest(2,['A'])

will give you the rows corresponding to the top 2 values of A.

use df.nsmallest for min values.


mx.iloc[0].idxmax()

This one line of code will give you how to find the maximum value from a row in dataframe, here mx is the dataframe and iloc[0] indicates the 0th index.


Very simple: we have df as below and we want to print a row with max value in C:

A  B  C
x  1  4
y  2  10
z  5  9

In:

df.loc[df['C'] == df['C'].max()]   # condition check

Out:

A B C
y 2 10

A more compact and readable solution using query() is like this:

import pandas as pd

df = pandas.DataFrame(np.random.randn(5,3),columns=['A','B','C'])
print(df)

# find row with maximum A
df.query('A == A.max()')

It also returns a DataFrame instead of Series, which would be handy for some use cases.


The idmax of the DataFrame returns the label index of the row with the maximum value and the behavior of argmax depends on version of pandas (right now it returns a warning). If you want to use the positional index, you can do the following:

max_row = df['A'].values.argmax()

or

import numpy as np
max_row = np.argmax(df['A'].values)

Note that if you use np.argmax(df['A']) behaves the same as df['A'].argmax().