FutureWarning elementwise comparison failed returning scalar but in the future will perform elementwise comparison

Question

I am using Pandas 0 19 1 on Python 3  I am getting a warning on these lines of code  I m trying to get a list that contains all the row numbers where string Peter is present at column Unnamed  5   df   pd read excel xls path  myRows   df df  Unnamed  5       Peter   index tolist     It produces a Warning     Python36 lib site-packages pandas core ops py 792  FutureWarning  elementwise  comparison failed  returning scalar  but in the future will perform  elementwise comparison  result   getattr x  name  y     What is this FutureWarning and should I ignore it since it seems to work

User · Answer

I get the same error when I try to set the index_col reading a file into a Panda's data-frame:

df = pd.read_csv('my_file.tsv', sep='\t', header=0, index_col=['0'])  ## or same with the following
df = pd.read_csv('my_file.tsv', sep='\t', header=0, index_col=[0])

I have never encountered such an error previously. I still am trying to figure out the reason behind this (using @Eric Leschinski explanation and others).

Anyhow, the following approach solves the problem for now until I figure the reason out:

df = pd.read_csv('my_file.tsv', sep='\t', header=0)  ## not setting the index_col
df.set_index(['0'], inplace=True)

I will update this as soon as I figure out the reason for such behavior.

User · Answer

I ve compared a few of the methods possible for doing this  including pandas  several numpy methods  and a list comprehension method   First  let s start with a baseline    gt  gt  gt  import numpy as np  gt  gt  gt  import operator  gt  gt  gt  import pandas as pd   gt  gt  gt  x    1  2  1  2   gt  gt  gt   time count   np sum np equal 1  x    gt  gt  gt  print  Count    using numpy equal with ints  format count   CPU times  user 52   s  sys  0 ns  total  52   s Wall time  56   s Count 2 using numpy equal with ints    So  our baseline is that the count should be correct 2  and we should take about 50 us   Now  we try the naive method    gt  gt  gt  x     s    b    s    b    gt  gt  gt   time count   np sum np equal  s   x    gt  gt  gt  print  Count    using numpy equal  format count   CPU times  user 145   s  sys  24   s  total  169   s Wall time  158   s Count NotImplemented using numpy equal  Library Frameworks Python framework Versions 3 6 lib python3 6 site-packages ipykernel launcher py 1  FutureWarning  elementwise comparison failed  returning scalar instead  but in the future will perform elementwise comparison      Entry point for launching an IPython kernel    And here  we get the wrong answer  NotImplemented    2   it takes us a long time  and it throws the warning   So we ll try another naive method    gt  gt  gt   time count   np sum x     s    gt  gt  gt  print  Count    using     format count   CPU times  user 46   s  sys  1   s  total  47   s Wall time  50 1   s Count 0 using      Again  the wrong answer  0    2    This is even more insidious because there s no subsequent warnings  0 can be passed around just like 2    Now  let s try a list comprehension    gt  gt  gt   time count   np sum  operator eq  x   s   for  x in x    gt  gt  gt  print  Count    using list comprehension  format count   CPU times  user 55   s  sys  1   s  total  56   s Wall time  60 3   s Count 2 using list comprehension   We get the right answer here  and it s pretty fast   Another possibility  pandas    gt  gt  gt  y   pd Series x   gt  gt  gt   time count   np sum y     s    gt  gt  gt  print  Count    using pandas     format count   CPU times  user 453   s  sys  31   s  total  484   s Wall time  463   s Count 2 using pandas      Slow  but correct   And finally  the option I m going to use  casting the numpy array to the object type    gt  gt  gt  x   np array   s    b    s    b    astype object   gt  gt  gt   time count   np sum np equal  s   x    gt  gt  gt  print  Count    using numpy equal  format count   CPU times  user 50   s  sys  1   s  total  51   s Wall time  55 1   s Count 2 using numpy equal   Fast and correct

User · Answer

Can t beat Eric Leschinski s awesomely detailed answer  but here s a quick workaround to the original question that I don t think has been mentioned yet - put the string in a list and use  isin instead of    For example  import pandas as pd import numpy as np  df   pd DataFrame   quot Name quot     quot Peter quot    quot Joe quot     quot Number quot    1  2       Raises warning using    to compare different types  df loc df  quot Number quot       quot 2 quot    quot Number quot      No warning using  isin  df loc df  quot Number quot   isin   quot 2 quot      quot Number quot

User · Answer

In my case  the warning occurred because of just the regular type of boolean indexing -- because the series had only np nan  Demonstration  pandas 1 0 3    gt  gt  gt  import pandas as pd  gt  gt  gt  import numpy as np  gt  gt  gt  pd Series  np nan   Hi        Hi  0    False 1     True  gt  gt  gt  pd Series  np nan  np nan       Hi    anaconda3 envs ms3 lib python3 7 site-packages pandas core ops array ops py 255  FutureWarning  elementwise comparison failed  returning scalar instead  but in the future will perform elementwise comparison   res values   method rvalues  0    False 1    False  I think with pandas 1 0 they really want you to use the new  string  datatype which allows for pd NA values   gt  gt  gt  pd Series  pd NA  pd NA       Hi  0    False 1    False  gt  gt  gt  pd Series  np nan  np nan   dtype  string       Hi  0     lt NA gt  1     lt NA gt   gt  gt  gt   pd Series  np nan  np nan   dtype  string       Hi   fillna False  0    False 1    False  Don t love at which point they tinkered with every-day functionality such as boolean indexing

User · Answer

My experience to the same warning message was caused by TypeError      TypeError  invalid type comparison   So  you may want to check the data type of the Unnamed  5  for x in df  Unnamed  5      print type x      are they  str      Here is how I can replicate the warning message    import pandas as pd import numpy as np df   pd DataFrame np random randn 3  2   columns   num1    num2    df  num3     3 df loc df  num3       3    num3     4    TypeError and the Warning df loc df  num3      3   num3     4    No Error   Hope it helps

User · Answer

This FutureWarning isn t from Pandas  it is from numpy and the bug also affects matplotlib and others  here s how to reproduce the warning nearer to the source of the trouble  import numpy as np print np   version        Numpy version  1 12 0   x  in np arange 5         Future warning thrown here  FutureWarning  elementwise comparison failed  returning scalar instead  but in the  future will perform elementwise comparison False  Another way to reproduce this bug using the double equals operator  import numpy as np np arange 5     np arange 5  astype str      FutureWarning thrown here  An example of Matplotlib affected by this FutureWarning under their quiver plot implementation  https   matplotlib org examples pylab examples quiver demo html What s going on here  There is a disagreement between Numpy and native python on what should happen when you compare a strings to numpy s numeric types   Notice the left operand is python s turf  a primitive string  and the middle operation is python s turf  but the right operand is numpy s turf   Should you return a Python style Scalar or a Numpy style ndarray of Boolean   Numpy says ndarray of bool  Pythonic developers disagree   Classic standoff  Should it be elementwise comparison or Scalar if item exists in the array  If your code or library is using the in or    operators to compare python string to numpy ndarrays  they aren t compatible  so when if you try it  it returns a scalar  but only for now   The Warning indicates that in the future this behavior might change so your code pukes all over the carpet if python numpy decide to do adopt Numpy style  Submitted Bug reports  Numpy and Python are in a standoff  for now the operation returns a scalar  but in the future it may change  https   github com numpy numpy issues 6784 https   github com pandas-dev pandas issues 7830 Two workaround solutions  Either lockdown your version of python and numpy  ignore the warnings and expect the behavior to not change  or convert both left and right operands of    and in to be from a numpy type or primitive python numeric type  Suppress the warning globally  import warnings import numpy as np warnings simplefilter action  ignore   category FutureWarning  print  x  in np arange 5      returns False  without Warning  Suppress the warning on a line by line basis  import warnings import numpy as np  with warnings catch warnings        warnings simplefilter action  ignore   category FutureWarning      print  x  in np arange 2      returns False  warning is suppressed  print  x  in np arange 10      returns False  Throws FutureWarning  Just suppress the warning by name  then put a loud comment next to it mentioning the current version of python and numpy  saying this code is brittle and requires these versions and put a link to here   Kick the can down the road  TLDR  pandas are Jedi  numpy are the hutts  and python is the galactic empire   https   youtu be OZczsiCfQQk t 3

User · Answer

Eric s answer helpfully explains that the trouble comes from comparing a Pandas Series  containing a NumPy array  to a Python string   Unfortunately  his two workarounds both just suppress the warning   To write code that doesn t cause the warning in the first place  explicitly compare your string to each element of the Series and get a separate bool for each   For example  you could use map and an anonymous function   myRows   df df  Unnamed  5   map  lambda x  x     Peter     index tolist

User · Answer

A quick workaround for this is to use numpy core defchararray  I also faced the same warning message and was able to resolve it using above module   import numpy core defchararray as npd resultdataset   npd equal dataset1  dataset2

User · Answer

If your arrays aren t too big or you don t have too many of them  you might be able to get away with forcing the left hand side of    to be a string   myRows   df str df  Unnamed  5        Peter   index tolist     But this is  1 5 times slower if df  Unnamed  5   is a string  25-30 times slower if df  Unnamed  5   is a small numpy array  length   10   and 150-160 times slower if it s a numpy array with length 100  times averaged over 500 trials    a   linspace 0  5  10  b   linspace 0  50  100  n   500 string1    Peter  string2    blargh  times a   zeros n  times str a   zeros n  times s   zeros n  times str s   zeros n  times b   zeros n  times str b   zeros n  for i in range n       t0   time time       tmp1   a    string1     t1   time time       tmp2   str a     string1     t2   time time       tmp3   string2    string1     t3   time time       tmp4   str string2     string1     t4   time time       tmp5   b    string1     t5   time time       tmp6   str b     string1     t6   time time       times a i    t1 - t0     times str a i    t2 - t1     times s i    t3 - t2     times str s i    t4 - t3     times b i    t5 - t4     times str b i    t6 - t5 print  Small array    print  Time to compare without str conversion     s  With str conversion     s  format mean times a   mean times str a    print  Ratio of time with without string conversion      format mean times str a  mean times a     print   nBig array   print  Time to compare without str conversion     s  With str conversion     s  format mean times b   mean times str b    print mean times str b  mean times b    print   nString   print  Time to compare without str conversion     s  With str conversion     s  format mean times s   mean times str s    print  Ratio of time with without string conversion      format mean times str s  mean times s      Result   Small array  Time to compare without str conversion  6 58464431763e-06 s  With str conversion  0 000173756599426 s Ratio of time with without string conversion  26 3881526541  Big array Time to compare without str conversion  5 44309616089e-06 s  With str conversion  0 000870866775513 s 159 99474375821288  String Time to compare without str conversion  5 89370727539e-07 s  With str conversion  8 30173492432e-07 s Ratio of time with without string conversion  1 40857605178

User · Answer

I had this code which was causing the error   for t in dfObj  time      if type t     str      the date   dateutil parser parse t      loc dt int   int the date timestamp        dfObj loc t    dfObj time   time     loc dt int   I changed it to this   for t in dfObj  time      try      the date   dateutil parser parse t      loc dt int   int the date timestamp        dfObj loc t    dfObj time   time     loc dt int   except Exception as e      print e      continue   to avoid the comparison  which is throwing the warning - as stated above  I only had to avoid the exception because of dfObj loc in the for loop  maybe there is a way to tell it not to check the rows it has already changed

User · Answer

I got this warning because I thought my column contained null strings  but on checking  it contained np nan   if df  column            Changing my column to empty strings helped

[python] FutureWarning: elementwise comparison failed; returning scalar, but in the future will perform elementwise comparison

Examples related to python

Examples related to python-3.x

Examples related to pandas

Examples related to numpy

Examples related to matplotlib