[python] Get index of a row of a pandas dataframe as an integer

Assume an easy dataframe, for example

    A         B
0   1  0.810743
1   2  0.595866
2   3  0.154888
3   4  0.472721
4   5  0.894525
5   6  0.978174
6   7  0.859449
7   8  0.541247
8   9  0.232302
9  10  0.276566

How can I retrieve an index value of a row, given a condition? For example: dfb = df[df['A']==5].index.values.astype(int) returns [4], but what I would like to get is just 4. This is causing me troubles later in the code.

Based on some conditions, I want to have a record of the indexes where that condition is fulfilled, and then select rows between.

I tried

dfb = df[df['A']==5].index.values.astype(int)
dfbb = df[df['A']==8].index.values.astype(int)
df.loc[dfb:dfbb,'B']

for a desired output

    A         B
4   5  0.894525
5   6  0.978174
6   7  0.859449

but I get TypeError: '[4]' is an invalid key

This question is related to python pandas numpy

The answer is


The easier is add [0] - select first value of list with one element:

dfb = df[df['A']==5].index.values.astype(int)[0]
dfbb = df[df['A']==8].index.values.astype(int)[0]

dfb = int(df[df['A']==5].index[0])
dfbb = int(df[df['A']==8].index[0])

But if possible some values not match, error is raised, because first value not exist.

Solution is use next with iter for get default parameetr if values not matched:

dfb = next(iter(df[df['A']==5].index), 'no match')
print (dfb)
4

dfb = next(iter(df[df['A']==50].index), 'no match')
print (dfb)
no match

Then it seems need substract 1:

print (df.loc[dfb:dfbb-1,'B'])
4    0.894525
5    0.978174
6    0.859449
Name: B, dtype: float64

Another solution with boolean indexing or query:

print (df[(df['A'] >= 5) & (df['A'] < 8)])
   A         B
4  5  0.894525
5  6  0.978174
6  7  0.859449

print (df.loc[(df['A'] >= 5) & (df['A'] < 8), 'B'])
4    0.894525
5    0.978174
6    0.859449
Name: B, dtype: float64

print (df.query('A >= 5 and A < 8'))
   A         B
4  5  0.894525
5  6  0.978174
6  7  0.859449

Little sum up for searching by row:

This can be useful if you don't know the column values ??or if columns have non-numeric values

if u want get index number as integer u can also do:

item = df[4:5].index.item()
print(item)
4

it also works in numpy / list:

numpy = df[4:7].index.to_numpy()[0]
lista = df[4:7].index.to_list()[0]

in [x] u pick number in range [4:7], for example if u want 6:

numpy = df[4:7].index.to_numpy()[2]
print(numpy)
6

for DataFrame:

df[4:7]

    A          B
4   5   0.894525
5   6   0.978174
6   7   0.859449

or:

df[(df.index>=4) & (df.index<7)]

    A          B
4   5   0.894525
5   6   0.978174
6   7   0.859449   

To answer the original question on how to get the index as an integer for the desired selection, the following will work :

df[df['A']==5].index.item()

The nature of wanting to include the row where A == 5 and all rows upto but not including the row where A == 8 means we will end up using iloc (loc includes both ends of slice).

In order to get the index labels we use idxmax. This will return the first position of the maximum value. I run this on a boolean series where A == 5 (then when A == 8) which returns the index value of when A == 5 first happens (same thing for A == 8).

Then I use searchsorted to find the ordinal position of where the index label (that I found above) occurs. This is what I use in iloc.

i5, i8 = df.index.searchsorted([df.A.eq(5).idxmax(), df.A.eq(8).idxmax()])
df.iloc[i5:i8]

enter image description here


numpy

you can further enhance this by using the underlying numpy objects the analogous numpy functions. I wrapped it up into a handy function.

def find_between(df, col, v1, v2):
    vals = df[col].values
    mx1, mx2 = (vals == v1).argmax(), (vals == v2).argmax()
    idx = df.index.values
    i1, i2 = idx.searchsorted([mx1, mx2])
    return df.iloc[i1:i2]

find_between(df, 'A', 5, 8)

enter image description here


timing
enter image description here


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to numpy

Unable to allocate array with shape and data type How to fix 'Object arrays cannot be loaded when allow_pickle=False' for imdb.load_data() function? Numpy, multiply array with scalar TypeError: only integer scalar arrays can be converted to a scalar index with 1D numpy indices array Could not install packages due to a "Environment error :[error 13]: permission denied : 'usr/local/bin/f2py'" Pytorch tensor to numpy array Numpy Resize/Rescale Image what does numpy ndarray shape do? How to round a numpy array? numpy array TypeError: only integer scalar arrays can be converted to a scalar index