[python] Efficient thresholding filter of an array with numpy

I need to filter an array to remove the elements that are lower than a certain threshold. My current code is like this:

threshold = 5
a = numpy.array(range(10)) # testing data
b = numpy.array(filter(lambda x: x >= threshold, a))

The problem is that this creates a temporary list, using a filter with a lambda function (slow).

As this is a quite simple operation, maybe there is a numpy function that does it in an efficient way, but I've been unable to find it.

I've thought that another way to achieve this could be sorting the array, finding the index of the threshold and returning a slice from that index onwards, but even if this would be faster for small inputs (and it won't be noticeable anyway), its definitively asymptotically less efficient as the input size grows.

Any ideas? Thanks!

Update: I took some measurements too, and the sorting+slicing was still twice as fast than the pure python filter when the input was 100.000.000 entries.

In [321]: r = numpy.random.uniform(0, 1, 100000000)

In [322]: %timeit test1(r) # filter
1 loops, best of 3: 21.3 s per loop

In [323]: %timeit test2(r) # sort and slice
1 loops, best of 3: 11.1 s per loop

In [324]: %timeit test3(r) # boolean indexing
1 loops, best of 3: 1.26 s per loop

This question is related to python filter numpy threshold

The answer is


b = a[a>threshold] this should do

I tested as follows:

import numpy as np, datetime
# array of zeros and ones interleaved
lrg = np.arange(2).reshape((2,-1)).repeat(1000000,-1).flatten()

t0 = datetime.datetime.now()
flt = lrg[lrg==0]
print datetime.datetime.now() - t0

t0 = datetime.datetime.now()
flt = np.array(filter(lambda x:x==0, lrg))
print datetime.datetime.now() - t0

I got

$ python test.py
0:00:00.028000
0:00:02.461000

http://docs.scipy.org/doc/numpy/user/basics.indexing.html#boolean-or-mask-index-arrays


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to filter

Monitoring the Full Disclosure mailinglist Pyspark: Filter dataframe based on multiple conditions How Spring Security Filter Chain works Copy filtered data to another sheet using VBA Filter object properties by key in ES6 How do I filter date range in DataTables? How do I filter an array with TypeScript in Angular 2? Filtering array of objects with lodash based on property value How to filter an array from all elements of another array How to specify "does not contain" in dplyr filter

Examples related to numpy

Unable to allocate array with shape and data type How to fix 'Object arrays cannot be loaded when allow_pickle=False' for imdb.load_data() function? Numpy, multiply array with scalar TypeError: only integer scalar arrays can be converted to a scalar index with 1D numpy indices array Could not install packages due to a "Environment error :[error 13]: permission denied : 'usr/local/bin/f2py'" Pytorch tensor to numpy array Numpy Resize/Rescale Image what does numpy ndarray shape do? How to round a numpy array? numpy array TypeError: only integer scalar arrays can be converted to a scalar index

Examples related to threshold

Efficient thresholding filter of an array with numpy