Truth value of a Series is ambiguous Use a empty a bool a item a any or a all

Question

Having issue filtering my result dataframe with an or condition  I want my result df to extract all column var values that are above 0 25 and below -0 25  This logic below gives me an ambiguous truth value however it work when I split this filtering in two separate operations  What is happening here  not sure where to use the suggested a empty    a bool    a item   a any   or a all    result   result  result  var   gt 0 25  or  result  var   lt -0 25

User · Answer

Or  alternatively  you could use Operator module  More detailed information is here Python docs  import operator import numpy as np import pandas as pd np random seed 0  df   pd DataFrame np random randn 5 3   columns list  ABC    df loc operator or  df C  gt  0 25  df C  lt  -0 25              A         B         C 0  1 764052  0 400157  0 978738 1  2 240893  1 867558 -0 977278 3  0 410599  0 144044  1 454274 4  0 761038  0 121675  0 4438

User · Answer

The or and and python statements require truth-values  For pandas these are considered ambiguous so you should use  bitwise     or  or  amp   and  operations   result   result  result  var   gt 0 25     result  var   lt -0 25     These are overloaded for these kind of datastructures to yield the element-wise or  or and      Just to add some more explanation to this statement   The exception is thrown when you want to get the bool of a pandas Series    gt  gt  gt  import pandas as pd  gt  gt  gt  x   pd Series  1    gt  gt  gt  bool x  ValueError  The truth value of a Series is ambiguous  Use a empty  a bool    a item    a any   or a all      What you hit was a place where the operator implicitly converted the operands to bool  you used or but it also happens for and  if and while     gt  gt  gt  x or x ValueError  The truth value of a Series is ambiguous  Use a empty  a bool    a item    a any   or a all     gt  gt  gt  x and x ValueError  The truth value of a Series is ambiguous  Use a empty  a bool    a item    a any   or a all     gt  gt  gt  if x          print  fun   ValueError  The truth value of a Series is ambiguous  Use a empty  a bool    a item    a any   or a all     gt  gt  gt  while x          print  fun   ValueError  The truth value of a Series is ambiguous  Use a empty  a bool    a item    a any   or a all      Besides these 4 statements there are several python functions that hide some bool calls  like any  all  filter       these are normally not problematic with pandas Series but for completeness I wanted to mention these     In your case the exception isn t really helpful  because it doesn t mention the right alternatives  For and and or you can use  if you want element-wise comparisons     numpy logical or    gt  gt  gt  import numpy as np  gt  gt  gt  np logical or x  y    or simply the   operator    gt  gt  gt  x   y  numpy logical and    gt  gt  gt  np logical and x  y    or simply the  amp  operator    gt  gt  gt  x  amp  y    If you re using the operators then make sure you set your parenthesis correctly because of the operator precedence   There are several logical numpy functions which should work on pandas Series     The alternatives mentioned in the Exception are more suited if you encountered it when doing if or while  I ll shortly explain each of these    If you want to check if your Series is empty    gt  gt  gt  x   pd Series      gt  gt  gt  x empty True  gt  gt  gt  x   pd Series  1    gt  gt  gt  x empty False   Python normally interprets the length of containers  like list  tuple       as truth-value if it has no explicit boolean interpretation  So if you want the python-like check  you could do  if x size or if not x empty instead of if x  If your Series contains one and only one boolean value    gt  gt  gt  x   pd Series  100    gt  gt  gt   x  gt  50  bool   True  gt  gt  gt   x  lt  50  bool   False  If you want to check the first and only item of your Series  like  bool   but works even for not boolean contents     gt  gt  gt  x   pd Series  100    gt  gt  gt  x item   100  If you want to check if all or any item is not-zero  not-empty or not-False    gt  gt  gt  x   pd Series  0  1  2    gt  gt  gt  x all       because one element is zero False  gt  gt  gt  x any       because one  or more  elements are non-zero True

User · Answer

I encountered the same error and got stalled with a pyspark dataframe for few days  I was able to resolve it successfully by filling na values with 0 since I was comparing integer values from 2 fields

User · Answer

One minor thing  which wasted my time  Put the conditions if comparing using  quot     quot    quot      quot   in parenthesis  failing to do so also raises this exception  This will work df  some condition  conditional operator  some conditions    This will not df some condition conditional-operator some condition

User · Answer

This excellent answer explains very well what is happening and provides a solution  I would like to add another solution that might be suitable in similar cases  using the query method   result   result query   var  gt  0 25  or  var  lt  -0 25      See also http   pandas pydata org pandas-docs stable indexing html indexing-query    Some tests with a dataframe I m currently working with suggest that this method is a bit slower than using the bitwise operators on series of booleans  2 ms vs  870   s   A piece of warning  At least one situation where this is not straightforward is when column names happen to be python expressions  I had columns named WT 38hph IP 2  WT 38hph input 2 and log2 WT 38hph IP 2 WT 38hph input 2  and wanted to perform the following query    log2 WT 38hph IP 2 WT 38hph input 2   gt  1  and  WT 38hph IP 2  gt  20    I obtained the following exception cascade    KeyError   log2  UndefinedVariableError  name  log2  is not defined ValueError   log2  is not a supported function   I guess this happened because the query parser was trying to make something from the first two columns instead of identifying the expression with the name of the third column   A possible workaround is proposed here

User · Answer

You need to use bitwise operators   instead of or and  amp  instead of and in pandas  you can t simply use the bool statements from python   For much complex filtering create a mask and apply the mask on the dataframe  Put all your query in the mask and apply it  Suppose  mask    df  quot col1 quot   gt  df  quot col2 quot     amp   stock  quot col1 quot   lt  df  quot col2 quot    df new   df mask

User · Answer

Well pandas use bitwise  amp    and each condition should be wrapped in a    For example following works data query   data  data  year    gt   2005   amp   data  year    lt   2010    But the same query without proper brackets does not data query   data  data  year    gt   2005  amp  data  year    lt   2010

User · Answer

For boolean logic  use  amp  and    np random seed 0  df   pd DataFrame np random randn 5 3   columns list  ABC      gt  gt  gt  df           A         B         C 0  1 764052  0 400157  0 978738 1  2 240893  1 867558 -0 977278 2  0 950088 -0 151357 -0 103219 3  0 410599  0 144044  1 454274 4  0 761038  0 121675  0 443863   gt  gt  gt  df loc  df C  gt  0 25     df C  lt  -0 25             A         B         C 0  1 764052  0 400157  0 978738 1  2 240893  1 867558 -0 977278 3  0 410599  0 144044  1 454274 4  0 761038  0 121675  0 443863  To see what is happening  you get a column of booleans for each comparison  e g  df C  gt  0 25 0     True 1    False 2    False 3     True 4     True Name  C  dtype  bool  When you have multiple criteria  you will get multiple columns returned   This is why the join logic is ambiguous   Using and or or treats each column separately  so you first need to reduce that column to a single boolean value   For example  to see if any value or all values in each of the columns is True    Any value in either column is True   df C  gt  0 25  any   or  df C  lt  -0 25  any   True    All values in either column is True   df C  gt  0 25  all   or  df C  lt  -0 25  all   False  One convoluted way to achieve the same thing is to zip all of these columns together  and perform the appropriate logic   gt  gt  gt  df  any  a  b   for a  b in zip df C  gt  0 25  df C  lt  -0 25              A         B         C 0  1 764052  0 400157  0 978738 1  2 240893  1 867558 -0 977278 3  0 410599  0 144044  1 454274 4  0 761038  0 121675  0 443863  For more details  refer to Boolean Indexing in the docs

User · Answer

I ll try to give the benchmark of the three most common way  also mentioned above   from timeit import repeat  setup    quot  quot  quot  import numpy as np  import random  x   np linspace 0 100   lb  ub   np sort  random random     100  random random     100   tolist    quot  quot  quot  stmts    x  x  gt  lb     x  lt   ub      x  x  gt  lb   amp   x  lt   ub      x np logical and x  gt  lb  x  lt   ub     for   in range 3       for stmt in stmts          t   min repeat stmt  setup  number 100 000           print    4f    t  stmt      print    result  0 4808 x  x  gt  lb     x  lt   ub   0 4726 x  x  gt  lb   amp   x  lt   ub   0 4904 x np logical and x  gt  lb  x  lt   ub    0 4725 x  x  gt  lb     x  lt   ub   0 4806 x  x  gt  lb   amp   x  lt   ub   0 5002 x np logical and x  gt  lb  x  lt   ub    0 4781 x  x  gt  lb     x  lt   ub   0 4336 x  x  gt  lb   amp   x  lt   ub   0 4974 x np logical and x  gt  lb  x  lt   ub    But    is not supported in Panda Series  and NumPy Array is faster than pandas data frame  arround 1000 times slower  see number   from timeit import repeat  setup    quot  quot  quot  import numpy as np  import random  import pandas as pd  x   pd DataFrame np linspace 0 100    lb  ub   np sort  random random     100  random random     100   tolist    quot  quot  quot  stmts    x  x  gt  lb   amp   x  lt   ub      x np logical and x  gt  lb  x  lt   ub     for   in range 3       for stmt in stmts          t   min repeat stmt  setup  number 100           print    4f    t  stmt      print    result  0 1964 x  x  gt  lb   amp   x  lt   ub   0 1992 x np logical and x  gt  lb  x  lt   ub    0 2018 x  x  gt  lb   amp   x  lt   ub   0 1838 x np logical and x  gt  lb  x  lt   ub    0 1871 x  x  gt  lb   amp   x  lt   ub   0 1883 x np logical and x  gt  lb  x  lt   ub    Note  adding one line of code x   x to numpy   will need about 20   s  For those who prefer  timeit  import numpy as np import random lb  ub   np sort  random random     100  random random     100   tolist   lb  ub x   pd DataFrame np linspace 0 100    def asterik x       x   x to numpy       return x  x  gt  lb     x  lt   ub    def and symbol x       x   x to numpy       return x  x  gt  lb   amp   x  lt   ub    def numpy logical x       x   x to numpy       return x np logical and x  gt  lb  x  lt   ub    for i in range 3        timeit asterik x       timeit and symbol x       timeit numpy logical x      print   n    result  23   s    3 62   s per loop  mean    std  dev  of 7 runs  10000 loops each  35 6   s    9 53   s per loop  mean    std  dev  of 7 runs  100000 loops each  31 3   s    8 9   s per loop  mean    std  dev  of 7 runs  100000 loops each    21 4   s    3 35   s per loop  mean    std  dev  of 7 runs  10000 loops each  21 9   s    1 02   s per loop  mean    std  dev  of 7 runs  100000 loops each  21 7   s    500 ns per loop  mean    std  dev  of 7 runs  100000 loops each    25 1   s    3 71   s per loop  mean    std  dev  of 7 runs  100000 loops each  36 8   s    18 3   s per loop  mean    std  dev  of 7 runs  100000 loops each  28 2   s    5 97   s per loop  mean    std  dev  of 7 runs  10000 loops each

[python] Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

Examples related to python

Examples related to pandas

Examples related to dataframe

Examples related to boolean

Examples related to filtering