[python] Find nearest value in numpy array

Is there a numpy-thonic way, e.g. function, to find the nearest value in an array?

Example:

np.find_nearest( array, value )

This question is related to python search numpy

The answer is


This is a vectorized version of unutbu's answer:

def find_nearest(array, values):
    array = np.asarray(array)

    # the last dim must be 1 to broadcast in (array - values) below.
    values = np.expand_dims(values, axis=-1) 

    indices = np.abs(array - values).argmin(axis=-1)

    return array[indices]


image = plt.imread('example_3_band_image.jpg')

print(image.shape) # should be (nrows, ncols, 3)

quantiles = np.linspace(0, 255, num=2 ** 2, dtype=np.uint8)

quantiled_image = find_nearest(quantiles, image)

print(quantiled_image.shape) # should be (nrows, ncols, 3)

Here's a version that will handle a non-scalar "values" array:

import numpy as np

def find_nearest(array, values):
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    return array[indices]

Or a version that returns a numeric type (e.g. int, float) if the input is scalar:

def find_nearest(array, values):
    values = np.atleast_1d(values)
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    out = array[indices]
    return out if len(out) > 1 else out[0]

For 2d array, to determine the i, j position of nearest element:

import numpy as np
def find_nearest(a, a0):
    idx = (np.abs(a - a0)).argmin()
    w = a.shape[1]
    i = idx // w
    j = idx - i * w
    return a[i,j], i, j

Summary of answer: If one has a sorted array then the bisection code (given below) performs the fastest. ~100-1000 times faster for large arrays, and ~2-100 times faster for small arrays. It does not require numpy either. If you have an unsorted array then if array is large, one should consider first using an O(n logn) sort and then bisection, and if array is small then method 2 seems the fastest.

First you should clarify what you mean by nearest value. Often one wants the interval in an abscissa, e.g. array=[0,0.7,2.1], value=1.95, answer would be idx=1. This is the case that I suspect you need (otherwise the following can be modified very easily with a followup conditional statement once you find the interval). I will note that the optimal way to perform this is with bisection (which I will provide first - note it does not require numpy at all and is faster than using numpy functions because they perform redundant operations). Then I will provide a timing comparison against the others presented here by other users.

Bisection:

def bisection(array,value):
    '''Given an ``array`` , and given a ``value`` , returns an index j such that ``value`` is between array[j]
    and array[j+1]. ``array`` must be monotonic increasing. j=-1 or j=len(array) is returned
    to indicate that ``value`` is out of range below and above respectively.'''
    n = len(array)
    if (value < array[0]):
        return -1
    elif (value > array[n-1]):
        return n
    jl = 0# Initialize lower
    ju = n-1# and upper limits.
    while (ju-jl > 1):# If we are not yet done,
        jm=(ju+jl) >> 1# compute a midpoint with a bitshift
        if (value >= array[jm]):
            jl=jm# and replace either the lower limit
        else:
            ju=jm# or the upper limit, as appropriate.
        # Repeat until the test condition is satisfied.
    if (value == array[0]):# edge cases at bottom
        return 0
    elif (value == array[n-1]):# and top
        return n-1
    else:
        return jl

Now I'll define the code from the other answers, they each return an index:

import math
import numpy as np

def find_nearest1(array,value):
    idx,val = min(enumerate(array), key=lambda x: abs(x[1]-value))
    return idx

def find_nearest2(array, values):
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    return indices

def find_nearest3(array, values):
    values = np.atleast_1d(values)
    indices = np.abs(np.int64(np.subtract.outer(array, values))).argmin(0)
    out = array[indices]
    return indices

def find_nearest4(array,value):
    idx = (np.abs(array-value)).argmin()
    return idx


def find_nearest5(array, value):
    idx_sorted = np.argsort(array)
    sorted_array = np.array(array[idx_sorted])
    idx = np.searchsorted(sorted_array, value, side="left")
    if idx >= len(array):
        idx_nearest = idx_sorted[len(array)-1]
    elif idx == 0:
        idx_nearest = idx_sorted[0]
    else:
        if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
            idx_nearest = idx_sorted[idx-1]
        else:
            idx_nearest = idx_sorted[idx]
    return idx_nearest

def find_nearest6(array,value):
    xi = np.argmin(np.abs(np.ceil(array[None].T - value)),axis=0)
    return xi

Now I'll time the codes: Note methods 1,2,4,5 don't correctly give the interval. Methods 1,2,4 round to nearest point in array (e.g. >=1.5 -> 2), and method 5 always rounds up (e.g. 1.45 -> 2). Only methods 3, and 6, and of course bisection give the interval properly.

array = np.arange(100000)
val = array[50000]+0.55
print( bisection(array,val))
%timeit bisection(array,val)
print( find_nearest1(array,val))
%timeit find_nearest1(array,val)
print( find_nearest2(array,val))
%timeit find_nearest2(array,val)
print( find_nearest3(array,val))
%timeit find_nearest3(array,val)
print( find_nearest4(array,val))
%timeit find_nearest4(array,val)
print( find_nearest5(array,val))
%timeit find_nearest5(array,val)
print( find_nearest6(array,val))
%timeit find_nearest6(array,val)

(50000, 50000)
100000 loops, best of 3: 4.4 µs per loop
50001
1 loop, best of 3: 180 ms per loop
50001
1000 loops, best of 3: 267 µs per loop
[50000]
1000 loops, best of 3: 390 µs per loop
50001
1000 loops, best of 3: 259 µs per loop
50001
1000 loops, best of 3: 1.21 ms per loop
[50000]
1000 loops, best of 3: 746 µs per loop

For a large array bisection gives 4us compared to next best 180us and longest 1.21ms (~100 - 1000 times faster). For smaller arrays it's ~2-100 times faster.


With slight modification, the answer above works with arrays of arbitrary dimension (1d, 2d, 3d, ...):

def find_nearest(a, a0):
    "Element in nd array `a` closest to the scalar value `a0`"
    idx = np.abs(a - a0).argmin()
    return a.flat[idx]

Or, written as a single line:

a.flat[np.abs(a - a0).argmin()]

All the answers are beneficial to gather the information to write efficient code. However, I have written a small Python script to optimize for various cases. It will be the best case if the provided array is sorted. If one searches the index of the nearest point of a specified value, then bisect module is the most time efficient. When one search the indices correspond to an array, the numpy searchsorted is most efficient.

import numpy as np
import bisect
xarr = np.random.rand(int(1e7))

srt_ind = xarr.argsort()
xar = xarr.copy()[srt_ind]
xlist = xar.tolist()
bisect.bisect_left(xlist, 0.3)

In [63]: %time bisect.bisect_left(xlist, 0.3) CPU times: user 0 ns, sys: 0 ns, total: 0 ns Wall time: 22.2 µs

np.searchsorted(xar, 0.3, side="left")

In [64]: %time np.searchsorted(xar, 0.3, side="left") CPU times: user 0 ns, sys: 0 ns, total: 0 ns Wall time: 98.9 µs

randpts = np.random.rand(1000)
np.searchsorted(xar, randpts, side="left")

%time np.searchsorted(xar, randpts, side="left") CPU times: user 4 ms, sys: 0 ns, total: 4 ms Wall time: 1.2 ms

If we follow the multiplicative rule, then numpy should take ~100 ms which implies ~83X faster.


Here's an extension to find the nearest vector in an array of vectors.

import numpy as np

def find_nearest_vector(array, value):
  idx = np.array([np.linalg.norm(x+y) for (x,y) in array-value]).argmin()
  return array[idx]

A = np.random.random((10,2))*100
""" A = array([[ 34.19762933,  43.14534123],
   [ 48.79558706,  47.79243283],
   [ 38.42774411,  84.87155478],
   [ 63.64371943,  50.7722317 ],
   [ 73.56362857,  27.87895698],
   [ 96.67790593,  77.76150486],
   [ 68.86202147,  21.38735169],
   [  5.21796467,  59.17051276],
   [ 82.92389467,  99.90387851],
   [  6.76626539,  30.50661753]])"""
pt = [6, 30]  
print find_nearest_vector(A,pt)
# array([  6.76626539,  30.50661753])

Here is a version with scipy for @Ari Onasafari, answer "to find the nearest vector in an array of vectors"

In [1]: from scipy import spatial

In [2]: import numpy as np

In [3]: A = np.random.random((10,2))*100

In [4]: A
Out[4]:
array([[ 68.83402637,  38.07632221],
       [ 76.84704074,  24.9395109 ],
       [ 16.26715795,  98.52763827],
       [ 70.99411985,  67.31740151],
       [ 71.72452181,  24.13516764],
       [ 17.22707611,  20.65425362],
       [ 43.85122458,  21.50624882],
       [ 76.71987125,  44.95031274],
       [ 63.77341073,  78.87417774],
       [  8.45828909,  30.18426696]])

In [5]: pt = [6, 30]  # <-- the point to find

In [6]: A[spatial.KDTree(A).query(pt)[1]] # <-- the nearest point 
Out[6]: array([  8.45828909,  30.18426696])

#how it works!
In [7]: distance,index = spatial.KDTree(A).query(pt)

In [8]: distance # <-- The distances to the nearest neighbors
Out[8]: 2.4651855048258393

In [9]: index # <-- The locations of the neighbors
Out[9]: 9

#then 
In [10]: A[index]
Out[10]: array([  8.45828909,  30.18426696])

If you don't want to use numpy this will do it:

def find_nearest(array, value):
    n = [abs(i-value) for i in array]
    idx = n.index(min(n))
    return array[idx]

Maybe helpful for ndarrays:

def find_nearest(X, value):
    return X[np.unravel_index(np.argmin(np.abs(X - value)), X.shape)]

I think the most pythonic way would be:

 num = 65 # Input number
 array = n.random.random((10))*100 # Given array 
 nearest_idx = n.where(abs(array-num)==abs(array-num).min())[0] # If you want the index of the element of array (array) nearest to the the given number (num)
 nearest_val = array[abs(array-num)==abs(array-num).min()] # If you directly want the element of array (array) nearest to the given number (num)

This is the basic code. You can use it as a function if you want


IF your array is sorted and is very large, this is a much faster solution:

def find_nearest(array,value):
    idx = np.searchsorted(array, value, side="left")
    if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])):
        return array[idx-1]
    else:
        return array[idx]

This scales to very large arrays. You can easily modify the above to sort in the method if you can't assume that the array is already sorted. It’s overkill for small arrays, but once they get large this is much faster.


import numpy as np
def find_nearest(array, value):
    array = np.array(array)
    z=np.abs(array-value)
    y= np.where(z == z.min())
    m=np.array(y)
    x=m[0,0]
    y=m[1,0]
    near_value=array[x,y]

    return near_value

array =np.array([[60,200,30],[3,30,50],[20,1,-50],[20,-500,11]])
print(array)
value = 0
print(find_nearest(array, value))

For large arrays, the (excellent) answer given by @Demitri is far faster than the answer currently marked as best. I've adapted his exact algorithm in the following two ways:

  1. The function below works whether or not the input array is sorted.

  2. The function below returns the index of the input array corresponding to the closest value, which is somewhat more general.

Note that the function below also handles a specific edge case that would lead to a bug in the original function written by @Demitri. Otherwise, my algorithm is identical to his.

def find_idx_nearest_val(array, value):
    idx_sorted = np.argsort(array)
    sorted_array = np.array(array[idx_sorted])
    idx = np.searchsorted(sorted_array, value, side="left")
    if idx >= len(array):
        idx_nearest = idx_sorted[len(array)-1]
    elif idx == 0:
        idx_nearest = idx_sorted[0]
    else:
        if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
            idx_nearest = idx_sorted[idx-1]
        else:
            idx_nearest = idx_sorted[idx]
    return idx_nearest

Here is a fast vectorized version of @Dimitri's solution if you have many values to search for (values can be multi-dimensional array):

#`values` should be sorted
def get_closest(array, values):
    #make sure array is a numpy array
    array = np.array(array)

    # get insert positions
    idxs = np.searchsorted(array, values, side="left")

    # find indexes where previous index is closer
    prev_idx_is_less = ((idxs == len(array))|(np.fabs(values - array[np.maximum(idxs-1, 0)]) < np.fabs(values - array[np.minimum(idxs, len(array)-1)])))
    idxs[prev_idx_is_less] -= 1

    return array[idxs]

Benchmarks

> 100 times faster than using a for loop with @Demitri's solution`

>>> %timeit ar=get_closest(np.linspace(1, 1000, 100), np.random.randint(0, 1050, (1000, 1000)))
139 ms ± 4.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit ar=[find_nearest(np.linspace(1, 1000, 100), value) for value in np.random.randint(0, 1050, 1000*1000)]
took 21.4 seconds

Here is a version that works with 2D arrays, using scipy's cdist function if the user has it, and a simpler distance calculation if they don't.

By default, the output is the index that is closest to the value you input, but you can change that with the output keyword to be one of 'index', 'value', or 'both', where 'value' outputs array[index] and 'both' outputs index, array[index].

For very large arrays, you may need to use kind='euclidean', as the default scipy cdist function may run out of memory.

This is maybe not the absolute fastest solution, but it is quite close.

def find_nearest_2d(array, value, kind='cdist', output='index'):
    # 'array' must be a 2D array
    # 'value' must be a 1D array with 2 elements
    # 'kind' defines what method to use to calculate the distances. Can choose one
    #    of 'cdist' (default) or 'euclidean'. Choose 'euclidean' for very large
    #    arrays. Otherwise, cdist is much faster.
    # 'output' defines what the output should be. Can be 'index' (default) to return
    #    the index of the array that is closest to the value, 'value' to return the
    #    value that is closest, or 'both' to return index,value
    import numpy as np
    if kind == 'cdist':
        try: from scipy.spatial.distance import cdist
        except ImportError:
            print("Warning (find_nearest_2d): Could not import cdist. Reverting to simpler distance calculation")
            kind = 'euclidean'
    index = np.where(array == value)[0] # Make sure the value isn't in the array
    if index.size == 0:
        if kind == 'cdist': index = np.argmin(cdist([value],array)[0])
        elif kind == 'euclidean': index = np.argmin(np.sum((np.array(array)-np.array(value))**2.,axis=1))
        else: raise ValueError("Keyword 'kind' must be one of 'cdist' or 'euclidean'")
    if output == 'index': return index
    elif output == 'value': return array[index]
    elif output == 'both': return index,array[index]
    else: raise ValueError("Keyword 'output' must be one of 'index', 'value', or 'both'")

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation Find a file by name in Visual Studio Code Search all the occurrences of a string in the entire project in Android Studio Java List.contains(Object with field value equal to x) Trigger an action after selection select2 How can I search for a commit message on GitHub? SQL search multiple values in same field Find a string by searching all tables in SQL Server Management Studio 2008 Search File And Find Exact Match And Print Line? Java - Search for files in a directory How to put a delay on AngularJS instant search?

Examples related to numpy

Unable to allocate array with shape and data type How to fix 'Object arrays cannot be loaded when allow_pickle=False' for imdb.load_data() function? Numpy, multiply array with scalar TypeError: only integer scalar arrays can be converted to a scalar index with 1D numpy indices array Could not install packages due to a "Environment error :[error 13]: permission denied : 'usr/local/bin/f2py'" Pytorch tensor to numpy array Numpy Resize/Rescale Image what does numpy ndarray shape do? How to round a numpy array? numpy array TypeError: only integer scalar arrays can be converted to a scalar index