Logical operators for boolean indexing in Pandas

Question

I m working with boolean index in Pandas  The question is why the statement   a  a  some column    some number   amp   a  some other column    some other number     works fine whereas  a  a  some column    some number  and  a  some other column    some other number     exits with error   Example   a pd DataFrame   x   1 1   y   10 20     In  a  a  x    1  amp  a  y    10   Out     x   y      0  1  10  In  a  a  x    1  and  a  y    10   Out  ValueError  The truth value of an array with more than one element is ambiguous      Use a any   or a all

User · Accepted Answer

When you say   a  x    1  and  a  y    10    You are implicitly asking Python to convert  a  x    1  and  a  y    10  to boolean values    NumPy arrays  of length greater than 1  and Pandas objects such as Series do not have a boolean value -- in other words  they raise   ValueError  The truth value of an array is ambiguous  Use a empty  a any   or a all      when used as a boolean value  That s because its unclear when it should be True or False  Some users might assume they are True if they have non-zero length  like a Python list  Others might desire for it to be True only if all its elements are True  Others might want it to be True if any of its elements are True    Because there are so many conflicting expectations  the designers of NumPy and Pandas refuse to guess  and instead raise a ValueError   Instead  you must be explicit  by calling the empty    all   or any   method to indicate which behavior you desire   In this case  however  it looks like you do not want boolean evaluation  you want element-wise logical-and  That is what the  amp  binary operator performs    a  x    1   amp   a  y    10    returns a boolean array      By the way  as alexpmil notes   the parentheses are mandatory since  amp  has a higher operator precedence than     Without the parentheses  a  x    1  amp  a  y    10 would be evaluated as a  x       1  amp  a  y       10 which would in turn be equivalent to the chained comparison  a  x       1  amp  a  y     and   1  amp  a  y       10   That is an expression of the form Series and Series  The use of and with two Series would again trigger the same ValueError as above  That s why the parentheses are mandatory

User · Answer

Logical operators for boolean indexing in Pandas   It s important to realize that you cannot use any of the Python logical operators  and  or or not  on pandas Series or pandas DataFrames  similarly you cannot use them on numpy arrays with more than one element   The reason why you cannot use those is because they implicitly call bool on their operands which throws an Exception because these data structures decided that the boolean of an array is ambiguous    gt  gt  gt  import numpy as np  gt  gt  gt  import pandas as pd  gt  gt  gt  arr   np array  1 2 3    gt  gt  gt  s   pd Series  1 2 3    gt  gt  gt  df   pd DataFrame  1 2 3    gt  gt  gt  bool arr  ValueError  The truth value of an array with more than one element is ambiguous  Use a any   or a all    gt  gt  gt  bool s  ValueError  The truth value of a Series is ambiguous  Use a empty  a bool    a item    a any   or a all     gt  gt  gt  bool df  ValueError  The truth value of a DataFrame is ambiguous  Use a empty  a bool    a item    a any   or a all      I did cover this more extensively  in my answer to the  Truth value of a Series is ambiguous  Use a empty  a bool    a item    a any   or a all    Q A   NumPys logical functions  However NumPy provides element-wise operating equivalents to these operators as functions that can be used on numpy array  pandas Series  pandas DataFrame  or any other  conforming  numpy array subclass    and has np logical and or has np logical or not has np logical not numpy logical xor which has no Python equivalent but is a logical  exclusive or  operation    So  essentially  one should use  assuming df1 and df2 are pandas DataFrames    np logical and df1  df2  np logical or df1  df2  np logical not df1  np logical xor df1  df2    Bitwise functions and bitwise operators for booleans  However in case you have boolean NumPy array  pandas Series  or pandas DataFrames you could also use the element-wise bitwise functions  for booleans they are - or at least should be - indistinguishable from the logical functions     bitwise and  np bitwise and or the  amp  operator bitwise or  np bitwise or or the   operator bitwise not  np invert  or the alias np bitwise not  or the   operator bitwise xor  np bitwise xor or the   operator   Typically the operators are used  However when combined with comparison operators one has to remember to wrap the comparison in parenthesis because the bitwise operators have a higher precedence than the comparison operators    df1  lt  10     df2  gt  10     instead of the wrong df1  lt  10   df2  gt  10   This may be irritating because the Python logical operators have a lower precendence than the comparison operators so you normally write a  lt  10 and b  gt  10  where a and b are for example simple integers  and don t need the parenthesis   Differences between logical and bitwise operations  on non-booleans   It is really important to stress that bit and logical operations are only equivalent for boolean NumPy arrays  and boolean Series  amp  DataFrames   If these don t contain booleans then the operations will give different results  I ll include examples using NumPy arrays but the results will be similar for the pandas data structures    gt  gt  gt  import numpy as np  gt  gt  gt  a1   np array  0  0  1  1    gt  gt  gt  a2   np array  0  1  0  1     gt  gt  gt  np logical and a1  a2  array  False  False  False   True    gt  gt  gt  np bitwise and a1  a2  array  0  0  0  1   dtype int32    And since NumPy  and similarly pandas  does different things for boolean  Boolean or    mask    index arrays  and integer  Index arrays  indices the results of indexing will be also be different    gt  gt  gt  a3   np array  1  2  3  4     gt  gt  gt  a3 np logical and a1  a2   array  4    gt  gt  gt  a3 np bitwise and a1  a2   array  1  1  1  2     Summary table  Logical operator   NumPy logical function   NumPy bitwise function   Bitwise operator -------------------------------------------------------------------------------------        and          np logical and          np bitwise and                   amp  -------------------------------------------------------------------------------------        or           np logical or           np bitwise or                     -------------------------------------------------------------------------------------                     np logical xor          np bitwise xor                    -------------------------------------------------------------------------------------        not          np logical not          np invert                           Where the logical operator does not work for NumPy arrays  pandas Series  and pandas DataFrames  The others work on these data structures  and plain Python objects  and work element-wise  However be careful with the bitwise invert on plain Python bools because the bool will be interpreted as integers in this context  for example  False returns -1 and  True returns -2

User · Answer

TLDR  Logical Operators in Pandas are  amp     and    and  parentheses       is important   Python s and  or and not logical operators are designed to work with scalars  So Pandas had to do one better and override the bitwise operators to achieve vectorized  element-wise  version of this functionality    So the following in python  exp1 and exp2 are expressions which evaluate to a boolean result       exp1 and exp2                Logical AND exp1 or exp2                 Logical OR not exp1                     Logical NOT      will translate to     exp1  amp  exp2                  Element-wise logical AND exp1   exp2                  Element-wise logical OR  exp1                        Element-wise logical NOT   for pandas   If in the process of performing logical operation you get a ValueError  then you need to use parentheses for grouping    exp1  op  exp2    For example    df  col1      x   amp   df  col2      y     And so on     Boolean Indexing  A common operation is to compute boolean masks through logical conditions to filter the data  Pandas provides three operators   amp  for logical AND    for logical OR  and   for logical NOT    Consider the following setup   np random seed 0  df   pd DataFrame np random choice 10   5  3    columns list  ABC    df     A  B  C 0  5  0  3 1  3  7  9 2  3  5  2 3  4  7  6 4  8  8  1   Logical AND  For df above  say you d like to return all rows where A  lt  5 and B   5  This is done by computing masks for each condition separately  and ANDing them    Overloaded Bitwise  amp  Operator Before continuing  please take note of this particular excerpt of the docs  which state     Another common operation is the use of boolean vectors to filter the   data  The operators are    for or   amp  for and  and   for not  These   must be grouped by using parentheses  since by default Python will   evaluate an expression such as df A  gt  2  amp  df B  lt  3 as df A  gt   2  amp    df B   lt  3  while the desired evaluation order is  df A  gt  2   amp   df B  lt    3     So  with this in mind  element wise logical AND can be implemented with the bitwise operator  amp     df  A    lt  5  0    False 1     True 2     True 3     True 4    False Name  A  dtype  bool  df  B    gt  5  0    False 1     True 2    False 3     True 4     True Name  B  dtype  bool      df  A    lt  5   amp   df  B    gt  5   0    False 1     True 2    False 3     True 4    False dtype  bool   And the subsequent filtering step is simply   df  df  A    lt  5   amp   df  B    gt  5       A  B  C 1  3  7  9 3  4  7  6   The parentheses are used to override the default precedence order of bitwise operators  which have higher precedence over the conditional operators  lt  and  gt   See the section of Operator Precedence in the python docs    If you do not use parentheses  the expression is evaluated incorrectly  For example  if you accidentally attempt something such as   df  A    lt  5  amp  df  B    gt  5   It is parsed as   df  A    lt   5  amp  df  B     gt  5   Which becomes    df  A    lt  something you dont want  gt  5   Which becomes  see the python docs on chained operator comparison     df  A    lt  something you dont want  and  something you dont want  gt  5    Which becomes      Both operands are Series    something else you dont want1 and something else you dont want2  Which throws  ValueError  The truth value of a Series is ambiguous  Use a empty  a bool    a item    a any   or a all      So  don t make that mistake 1  Avoiding Parentheses Grouping The fix is actually quite simple  Most operators have a corresponding bound method for DataFrames  If the individual masks are built up using functions instead of conditional operators  you will no longer need to group by parens to specify evaluation order   df  A   lt 5   0     True 1     True 2     True 3     True 4    False Name  A  dtype  bool  df  B   gt 5   0    False 1     True 2    False 3     True 4     True Name  B  dtype  bool     df  A   lt 5   amp  df  B   gt 5   0    False 1     True 2    False 3     True 4    False dtype  bool   See the section on Flexible Comparisons   To summarise  we have   ------------------------------           Operator      Function        ---- ------------ ------------       0     gt              gt             ---- ------------ ------------       1     gt              ge             ---- ------------ ------------       2     lt              lt             ---- ------------ ------------       3     lt              le             ---- ------------ ------------       4                  eq             ---- ------------ ------------       5                  ne             ------------------------------    Another option for avoiding parentheses is to use DataFrame query  or eval    df query  A  lt  5 and B  gt  5       A  B  C 1  3  7  9 3  4  7  6   I have extensively documented query and eval in Dynamic Expression Evaluation in pandas using pd eval     operator and  Allows you to perform this operation in a functional manner  Internally calls Series   and   which corresponds to the bitwise operator   import operator   operator and  df  A    lt  5  df  B    gt  5    Same as     df  A    lt  5    and   df  B    gt  5    0    False 1     True 2    False 3     True 4    False dtype  bool  df operator and  df  A    lt  5  df  B    gt  5       A  B  C 1  3  7  9 3  4  7  6   You won t usually need this  but it is useful to know   Generalizing  np logical and  and logical and reduce  Another alternative is using  np logical and  which also does not need parentheses grouping   np logical and df  A    lt  5  df  B    gt  5   0    False 1     True 2    False 3     True 4    False Name  A  dtype  bool  df np logical and df  A    lt  5  df  B    gt  5       A  B  C 1  3  7  9 3  4  7  6   np logical and is a ufunc  Universal Functions   and most ufuncs have a reduce method  This means it is easier to generalise with logical and if you have multiple masks to AND  For example  to AND masks m1 and m2 and m3 with  amp   you would have to do   m1  amp  m2  amp  m3   However  an easier option is   np logical and reduce  m1  m2  m3     This is powerful  because it lets you build on top of this with more complex logic  for example  dynamically generating masks in a list comprehension and adding all of them    import operator  cols     A    B   ops    np less  np greater  values    5  5   m   np logical and reduce  op df c   v  for op  c  v in zip ops  cols  values    m    array  False   True  False   True  False    df m     A  B  C 1  3  7  9 3  4  7  6   1 - I know I m harping on this point  but please bear with me  This is a very  very common beginner s mistake  and must be explained very thoroughly      Logical OR  For the df above  say you d like to return all rows where A    3 or B    7   Overloaded Bitwise        df  A      3  0    False 1     True 2     True 3    False 4    False Name  A  dtype  bool  df  B      7  0    False 1     True 2    False 3     True 4    False Name  B  dtype  bool      df  A      3     df  B      7   0    False 1     True 2     True 3     True 4    False dtype  bool  df  df  A      3     df  B      7       A  B  C 1  3  7  9 2  3  5  2 3  4  7  6   If you haven t yet  please also read the section on Logical AND above  all caveats apply here   Alternatively  this operation can be specified with   df df  A   eq 3    df  B   eq 7       A  B  C 1  3  7  9 2  3  5  2 3  4  7  6   operator or  Calls Series   or   under the hood   operator or  df  A      3  df  B      7    Same as     df  A      3    or   df  B      7   0    False 1     True 2     True 3     True 4    False dtype  bool  df operator or  df  A      3  df  B      7       A  B  C 1  3  7  9 2  3  5  2 3  4  7  6   np logical or For two conditions  use logical or   np logical or df  A      3  df  B      7   0    False 1     True 2     True 3     True 4    False Name  A  dtype  bool  df np logical or df  A      3  df  B      7       A  B  C 1  3  7  9 2  3  5  2 3  4  7  6   For multiple masks  use logical or reduce   np logical or reduce  df  A      3  df  B      7     array  False   True   True   True  False    df np logical or reduce  df  A      3  df  B      7        A  B  C 1  3  7  9 2  3  5  2 3  4  7  6     Logical NOT  Given a mask  such as   mask   pd Series  True  True  False     If you need to invert every boolean value  so that the end result is  False  False  True    then you can use any of the methods below   Bitwise        mask  0    False 1    False 2     True dtype  bool   Again  expressions need to be parenthesised     df  A      3   0     True 1    False 2    False 3     True 4     True Name  A  dtype  bool   This internally calls   mask   invert      0    False 1    False 2     True dtype  bool   But don t use it directly   operator inv Internally calls   invert   on the Series   operator inv mask   0    False 1    False 2     True dtype  bool   np logical not This is the numpy variant   np logical not mask   0    False 1    False 2     True dtype  bool     Note  np logical and can be substituted for np bitwise and  logical or with bitwise or  and logical not with invert

[python] Logical operators for boolean indexing in Pandas

Examples related to python

Examples related to pandas

Examples related to dataframe

Examples related to boolean

Examples related to filtering