What is the most efficient way to check if a value exists in a NumPy array

Question

I have a very large NumPy array  1 40 3 4 50 4 5 60 7 5 49 6 6 70 8 8 80 9 8 72 1 9 90 7         I want to check to see if a value exists in the 1st column of the array   I ve got a bunch of homegrown ways  e g  iterating through each row and checking   but given the size of the array I d like to find the most efficient method   Thanks

User · Answer

To check multiple values  you can use numpy in1d    which is an element-wise function version of the python keyword in  If your data is sorted  you can use numpy searchsorted     import numpy as np data   np array  1 4 5 5 6 8 8 9   values    2 3 4 6 7  print np in1d values  data   index   np searchsorted data  values  print data index     values

User · Answer

The most obvious to me would be   np any my array    0     value

User · Answer

Fascinating  I needed to improve the speed of a series of loops that must perform matching index determination in this same way  So I decided to time all the solutions here  along with some riff s   Here are my speed tests for Python 2 7 10   import timeit timeit timeit  N any N in1d sids  val     setup    import numpy as N  val   20010401020091  sids   N array  20010401010101 x for x in range 1000        18 86137104034424  timeit timeit  val in sids   setup    import numpy as N  val   20010401020091  sids    20010401010101 x for x in range 1000       15 061666011810303  timeit timeit  N in1d sids  val    setup    import numpy as N  val   20010401020091  sids   N array  20010401010101 x for x in range 1000        11 613027095794678  timeit timeit  N any val    sids    setup    import numpy as N  val   20010401020091  sids   N array  20010401010101 x for x in range 1000        7 670552015304565  timeit timeit  val in sids   setup    import numpy as N  val   20010401020091  sids   N array  20010401010101 x for x in range 1000        5 610057830810547  timeit timeit  val    sids   setup    import numpy as N  val   20010401020091  sids   N array  20010401010101 x for x in range 1000        1 6632978916168213  timeit timeit  val in sids   setup    import numpy as N  val   20010401020091  sids   set  20010401010101 x for x in range 1000        0 0548710823059082  timeit timeit  val in sids   setup    import numpy as N  val   20010401020091  sids   dict zip  20010401010101 x for x in range 1000    True   1000       0 054754018783569336  Very surprising  Orders of magnitude difference   To summarize  if you just want to know whether something s in a 1D list or not    19s  N any N in1d numpy array     15s  x in  list     8s  N any x    numpy array     6s  x in  numpy array    1s  x in  set or a dictionary    If you want to know where something is in the list as well  order is important     12s  N in1d x  numpy array  2s  x     numpy array

User · Answer

The most convenient way according to me is    Val in X    col num     where Val is the value  that you want to check for and X is the array  In your example  suppose you want to check if the value 8 exists in your the third column  Simply write   8 in X    2     This will return True if 8 is there in the third column  else False

User · Answer

How about  if value in my array    col num       do whatever   Edit  I think   contains   is implemented in such a way that this is the same as  detly s version

User · Answer

Adding to  HYRY s answer in1d seems to be fastest for numpy  This is using numpy 1 8 and python 2 7 6  In this test in1d was fastest  however 10 in a look cleaner  a   arange 0 99999 3   timeit 10 in a  timeit in1d a  10   10000 loops  best of 3  150   s per loop 10000 loops  best of 3  61 9   s per loop  Constructing a set is slower than calling in1d  but checking if the value exists is a bit faster  s   set range 0  99999  3    timeit 10 in s  10000000 loops  best of 3  47 ns per loop

[python] What is the most efficient way to check if a value exists in a NumPy array?

Examples related to python

Examples related to performance

Examples related to numpy