[python] Python Pandas: Get index of rows which column matches certain value

Given a DataFrame with a column "BoolCol", we want to find the indexes of the DataFrame in which the values for "BoolCol" == True

I currently have the iterating way to do it, which works perfectly:

for i in range(100,3000):
    if df.iloc[i]['BoolCol']== True:
         print i,df.iloc[i]['BoolCol']

But this is not the correct panda's way to do it. After some research, I am currently using this code:

df[df['BoolCol'] == True].index.tolist()

This one gives me a list of indexes, but they dont match, when I check them by doing:

df.iloc[i]['BoolCol']

The result is actually False!!

Which would be the correct Pandas way to do this?

This question is related to python indexing pandas

The answer is


Can be done using numpy where() function:

import pandas as pd
import numpy as np

In [716]: df = pd.DataFrame({"gene_name": ['SLC45A1', 'NECAP2', 'CLIC4', 'ADC', 'AGBL4'] , "BoolCol": [False, True, False, True, True] },
       index=list("abcde"))

In [717]: df
Out[717]: 
  BoolCol gene_name
a   False   SLC45A1
b    True    NECAP2
c   False     CLIC4
d    True       ADC
e    True     AGBL4

In [718]: np.where(df["BoolCol"] == True)
Out[718]: (array([1, 3, 4]),)

In [719]: select_indices = list(np.where(df["BoolCol"] == True)[0])

In [720]: df.iloc[select_indices]
Out[720]: 
  BoolCol gene_name
b    True    NECAP2
d    True       ADC
e    True     AGBL4

Though you don't always need index for a match, but incase if you need:

In [796]: df.iloc[select_indices].index
Out[796]: Index([u'b', u'd', u'e'], dtype='object')

In [797]: df.iloc[select_indices].index.tolist()
Out[797]: ['b', 'd', 'e']

If you want to use your dataframe object only once, use:

df['BoolCol'].loc[lambda x: x==True].index

Simple way is to reset the index of the DataFrame prior to filtering:

df_reset = df.reset_index()
df_reset[df_reset['BoolCol']].index.tolist()

Bit hacky, but it's quick!


First you may check query when the target column is type bool (PS: about how to use it please check link )

df.query('BoolCol')
Out[123]: 
    BoolCol
10     True
40     True
50     True

After we filter the original df by the Boolean column we can pick the index .

df=df.query('BoolCol')
df.index
Out[125]: Int64Index([10, 40, 50], dtype='int64')

Also pandas have nonzero, we just select the position of True row and using it slice the DataFrame or index

df.index[df.BoolCol.nonzero()[0]]
Out[128]: Int64Index([10, 40, 50], dtype='int64')

I extended this question that is how to gets the row, columnand value of all matches value?

here is solution:

import pandas as pd
import numpy as np


def search_coordinate(df_data: pd.DataFrame, search_set: set) -> list:
    nda_values = df_data.values
    tuple_index = np.where(np.isin(nda_values, [e for e in search_set]))
    return [(row, col, nda_values[row][col]) for row, col in zip(tuple_index[0], tuple_index[1])]


if __name__ == '__main__':
    test_datas = [['cat', 'dog', ''],
                  ['goldfish', '', 'kitten'],
                  ['Puppy', 'hamster', 'mouse']
                  ]
    df_data = pd.DataFrame(test_datas)
    print(df_data)
    result_list = search_coordinate(df_data, {'dog', 'Puppy'})
    print(f"\n\n{'row':<4} {'col':<4} {'name':>10}")
    [print(f"{row:<4} {col:<4} {name:>10}") for row, col, name in result_list]

Output:

          0        1       2
0       cat      dog        
1  goldfish           kitten
2     Puppy  hamster   mouse


row  col        name
0    1           dog
2    0         Puppy

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to indexing

numpy array TypeError: only integer scalar arrays can be converted to a scalar index How to print a specific row of a pandas DataFrame? What does 'index 0 is out of bounds for axis 0 with size 0' mean? How does String.Index work in Swift Pandas KeyError: value not in index Update row values where certain condition is met in pandas Pandas split DataFrame by column value Rebuild all indexes in a Database How are iloc and loc different? pandas loc vs. iloc vs. at vs. iat?

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float