How to implement a Boolean search with multiple columns in pandas

Question

I have a pandas df and would like to accomplish something along these lines  in SQL terms    SELECT   FROM df WHERE column1    a  OR column2    b  OR column3    c  etc    Now this works  for one column value pair   foo   df loc df  column    value    However  I m not sure how to expand that to multiple column value pairs    To be clear  each column matches a different value

User · Answer

A more concise--but not necessarily faster--method is to use DataFrame isin   and DataFrame any    In  27   n   10  In  28   df   DataFrame randint 4  size  n  2    columns list  ab     In  29   df Out 29      a  b 0  0  0 1  1  1 2  1  1 3  2  3 4  2  3 5  0  2 6  1  2 7  3  0 8  1  1 9  2  2   10 rows x 2 columns   In  30   df isin  1  2   Out 30          a      b 0  False  False 1   True   True 2   True   True 3   True  False 4   True  False 5  False   True 6   True   True 7  False  False 8   True   True 9   True   True   10 rows x 2 columns   In  31   df isin  1  2   any 1  Out 31   0    False 1     True 2     True 3     True 4     True 5     True 6     True 7    False 8     True 9     True dtype  bool  In  32   df loc df isin  1  2   any 1   Out 32      a  b 1  1  1 2  1  1 3  2  3 4  2  3 5  0  2 6  1  2 8  1  1 9  2  2   8 rows x 2 columns

User · Answer

Easiest way to do this  if this helpful hit up arrow  Tahnks    students       jack1    Apples1    341                   Riti1    Mangos1     311                   Aadi1    Grapes1    301                   Sonia1    Apples1   321                   Lucy1    Mangos1     331                   Mike1    Apples1    351                   Mik    Apples1    np nan                   Create a DataFrame object df   pd DataFrame students  columns     Name1     Product1    Sale1     print df        Name1 Product1  Sale1 0   jack1  Apples1    341 1   Riti1  Mangos1    311 2   Aadi1  Grapes1    301 3  Sonia1  Apples1    321 4   Lucy1  Mangos1    331 5   Mike1  Apples1    351 6     Mik  Apples1    NaN    Select rows in above DataFrame for which    Product    column contains the value    Apples     subset   df df  Product1       Apples1   print subset    Name1 Product1  Sale1 0   jack1  Apples1    341 3  Sonia1  Apples1    321 5   Mike1  Apples1    351 6     Mik  Apples1    NA    Select rows in above DataFrame for which    Product    column contains the value    Apples     AND notnull value in Sale  subsetx  df  df  Product1       Apples1     amp   df  Sale1   notnull     print subsetx      Name1   Product1    Sale1 0   jack1   Apples1      341 3   Sonia1  Apples1      321 5   Mike1   Apples1      351    Select rows in above DataFrame for which    Product    column contains the value    Apples     AND Sale   351  subsetx  df  df  Product1       Apples1     amp   df  Sale1      351   print subsetx      Name1 Product1  Sale1 5  Mike1  Apples1    351    Another example subsetData   df df  Product1   isin   Mangos1    Grapes1      print subsetData   Name1 Product1  Sale1 1  Riti1  Mangos1    311 2  Aadi1  Grapes1    301 4  Lucy1  Mangos1    331    Here is the Original link I found this  I edit it a little bit --  https   thispointer com python-pandas-select-rows-in-dataframe-by-conditions-on-multiple-columns

User · Answer

All the considerations made by  EdChum in 2014 are still valid  but the pandas Dataframe ix method is deprecated from the version 0 0 20 of pandas  Directly from the docs      Warning  Starting in 0 20 0  the  ix indexer is deprecated  in favor   of the more strict  iloc and  loc indexers    In subsequent versions of pandas  this method has been replaced by new indexing methods pandas Dataframe loc and pandas Dataframe iloc    If you want to learn more  in this post you can find comparisons between the methods mentioned above   Ultimately  to date  and there does not seem to be any change in the upcoming versions of pandas from this point of view   the answer to this question is as follows   foo   df loc  df  column1    value     df  columns2       b      df  column3       c

User · Answer

You need to enclose multiple conditions in braces due to operator precedence and use the bitwise and   amp   and or     operators   foo   df  df  column1    value     df  columns2       b      df  column3       c      If you use and or or  then pandas is likely to moan that the comparison is ambiguous  In that case  it is unclear whether we are comparing every value in a series in the condition  and what does it mean if only 1 or all but 1 match the condition  That is why you should use the bitwise operators or the numpy np all or np any to specify the matching criteria   There is also the query method  http   pandas pydata org pandas-docs dev generated pandas DataFrame query html  but there are some limitations mainly to do with issues where there could be ambiguity between column names and index values

User · Answer

the query   method can do that very intuitively  Express your condition in a string to be evaluated like the following example   df   df query  quot columnNameA  lt    x or columnNameB     y quot    with x and y are declared variables which you can refer to with

[python] How to implement a Boolean search with multiple columns in pandas

Examples related to python

Examples related to pandas