How to determine whether a Pandas Column contains a particular value

Question

I am trying to determine whether there is an entry in a Pandas column that has a particular value  I tried to do this with if x in df  id    I thought this was working  except when I fed it a value that I knew was not in the column 43 in df  id   it still returned True  When I subset to a data frame only containing entries matching the missing id df df  id      43  there are  obviously  no entries in it  How to I determine if a column in a Pandas data frame contains a particular value and why doesn t my current method work   FYI  I have the same problem when I use the implementation in this answer to a similar question

User · Answer

You can also use pandas Series isin although it s a little bit longer than  a  in s values   In  2   s   pd Series list  abc     In  3   s Out 3    0    a 1    b 2    c dtype  object  In  3   s isin   a    Out 3    0    True 1    False 2    False dtype  bool  In  4   s s isin   a     empty Out 4   False  In  5   s s isin   z     empty Out 5   True   But this approach can be more flexible if you need to match multiple values at once for a DataFrame  see DataFrame isin    gt  gt  gt  df   DataFrame   A    1  2  3    B    1  4  7     gt  gt  gt  df isin   A    1  3    B    4  7  12           A      B 0   True  False    Note that B didn t match 1 here  1  False   True 2   True   True

User · Answer

Or use Series tolist or Series any    gt  gt  gt  s   pd Series list  abc     gt  gt  gt  s 0    a 1    b 2    c dtype  object  gt  gt  gt   a  in s tolist   True  gt  gt  gt   s   a   any   True   Series tolist makes a list about of a Series  and the other one i am just getting a boolean Series from a regular Series  then checking if there are any Trues in the boolean Series

User · Answer

Simple condition   if any str elem  in   a   b   for elem in df  column   tolist

User · Answer

Use   df df  id    x  index tolist     If x is present in id then it ll return the list of indices where it is present  else it gives an empty list

User · Answer

Suppose you dataframe looks like      Now you want to check if filename  80900026941984  is present in the dataframe or not   You can simply write    if sum df  filename   astype  str   str contains  80900026941984     gt  0      print  found

User · Answer

found   df df  Column   str contains  Text to search    print found count      the found count   will contains number of matches  And if it is 0 then means string was not found in the Column

User · Answer

I did a few simple tests   In  10   x   pd Series range 1000000    In  13   timeit 999999 in x values 567   s    25 6   s per loop  mean    std  dev  of 7 runs  1000 loops each   In  15   timeit x isin  999999   any   9 54 ms    291   s per loop  mean    std  dev  of 7 runs  100 loops each   In  16   timeit  x    999999  any   6 86 ms    107   s per loop  mean    std  dev  of 7 runs  100 loops each   In  17   timeit 999999 in set x  79 8 ms    1 98 ms per loop  mean    std  dev  of 7 runs  10 loops each   In  21   timeit x eq 999999  any   7 03 ms    33 7   s per loop  mean    std  dev  of 7 runs  100 loops each   In  22   timeit x eq 9  any   7 04 ms    60   s per loop  mean    std  dev  of 7 runs  100 loops each   In  24   timeit 9 in x values 666   s    15 7   s per loop  mean    std  dev  of 7 runs  1000 loops each    Interestingly it doesn t matter if you look up 9 or 999999  it seems like it takes about the same amount of time using the in syntax  must be using binary search   In  24   timeit 9 in x values 666   s    15 7   s per loop  mean    std  dev  of 7 runs  1000 loops each   In  25   timeit 9999 in x values 647   s    5 21   s per loop  mean    std  dev  of 7 runs  1000 loops each   In  26   timeit 999999 in x values 642   s    2 11   s per loop  mean    std  dev  of 7 runs  1000 loops each   In  27   timeit 99199 in x values 644   s    5 31   s per loop  mean    std  dev  of 7 runs  1000 loops each   In  28   timeit 1 in x values 667   s    20 8   s per loop  mean    std  dev  of 7 runs  1000 loops each    Seems like using x values is the fastest  but maybe there is a more elegant way in pandas

User · Answer

in of a Series checks whether the value is in the index   In  11   s   pd Series list  abc     In  12   s Out 12    0    a 1    b 2    c dtype  object  In  13   1 in s Out 13   True  In  14    a  in s Out 14   False   One option is to see if it s in unique values   In  21   s unique   Out 21   array   a    b    c    dtype object   In  22    a  in s unique   Out 22   True   or a python set   In  23   set s  Out 23     a    b    c    In  24    a  in set s  Out 24   True   As pointed out by  DSM  it may be more efficient  especially if you re just doing this for one value  to just use in directly on the values   In  31   s values Out 31   array   a    b    c    dtype object   In  32    a  in s values Out 32   True

[python] How to determine whether a Pandas Column contains a particular value

Examples related to python

Examples related to pandas