[python] Most efficient way to find mode in numpy array

I have a 2D array containing integers (both positive or negative). Each row represents the values over time for a particular spatial site, whereas each column represents values for various spatial sites for a given time.

So if the array is like:

1 3 4 2 2 7
5 2 2 1 4 1
3 3 2 2 1 1

The result should be

1 3 2 2 2 1

Note that when there are multiple values for mode, any one (selected randomly) may be set as mode.

I can iterate over the columns finding mode one at a time but I was hoping numpy might have some in-built function to do that. Or if there is a trick to find that efficiently without looping.

This question is related to python numpy 2d mode

The answer is


Expanding on this method, applied to finding the mode of the data where you may need the index of the actual array to see how far away the value is from the center of the distribution.

(_, idx, counts) = np.unique(a, return_index=True, return_counts=True)
index = idx[np.argmax(counts)]
mode = a[index]

Remember to discard the mode when len(np.argmax(counts)) > 1, also to validate if it is actually representative of the central distribution of your data you may check whether it falls inside your standard deviation interval.


simplest way in Python to get the mode of an list or array a

   import statistics
   print("mode = "+str(statistics.(mode(a)))

That's it


Update

The scipy.stats.mode function has been significantly optimized since this post, and would be the recommended method

Old answer

This is a tricky problem, since there is not much out there to calculate mode along an axis. The solution is straight forward for 1-D arrays, where numpy.bincount is handy, along with numpy.unique with the return_counts arg as True. The most common n-dimensional function I see is scipy.stats.mode, although it is prohibitively slow- especially for large arrays with many unique values. As a solution, I've developed this function, and use it heavily:

import numpy

def mode(ndarray, axis=0):
    # Check inputs
    ndarray = numpy.asarray(ndarray)
    ndim = ndarray.ndim
    if ndarray.size == 1:
        return (ndarray[0], 1)
    elif ndarray.size == 0:
        raise Exception('Cannot compute mode on empty array')
    try:
        axis = range(ndarray.ndim)[axis]
    except:
        raise Exception('Axis "{}" incompatible with the {}-dimension array'.format(axis, ndim))

    # If array is 1-D and numpy version is > 1.9 numpy.unique will suffice
    if all([ndim == 1,
            int(numpy.__version__.split('.')[0]) >= 1,
            int(numpy.__version__.split('.')[1]) >= 9]):
        modals, counts = numpy.unique(ndarray, return_counts=True)
        index = numpy.argmax(counts)
        return modals[index], counts[index]

    # Sort array
    sort = numpy.sort(ndarray, axis=axis)
    # Create array to transpose along the axis and get padding shape
    transpose = numpy.roll(numpy.arange(ndim)[::-1], axis)
    shape = list(sort.shape)
    shape[axis] = 1
    # Create a boolean array along strides of unique values
    strides = numpy.concatenate([numpy.zeros(shape=shape, dtype='bool'),
                                 numpy.diff(sort, axis=axis) == 0,
                                 numpy.zeros(shape=shape, dtype='bool')],
                                axis=axis).transpose(transpose).ravel()
    # Count the stride lengths
    counts = numpy.cumsum(strides)
    counts[~strides] = numpy.concatenate([[0], numpy.diff(counts[~strides])])
    counts[strides] = 0
    # Get shape of padded counts and slice to return to the original shape
    shape = numpy.array(sort.shape)
    shape[axis] += 1
    shape = shape[transpose]
    slices = [slice(None)] * ndim
    slices[axis] = slice(1, None)
    # Reshape and compute final counts
    counts = counts.reshape(shape).transpose(transpose)[slices] + 1

    # Find maximum counts and return modals/counts
    slices = [slice(None, i) for i in sort.shape]
    del slices[axis]
    index = numpy.ogrid[slices]
    index.insert(axis, numpy.argmax(counts, axis=axis))
    return sort[index], counts[index]

Result:

In [2]: a = numpy.array([[1, 3, 4, 2, 2, 7],
                         [5, 2, 2, 1, 4, 1],
                         [3, 3, 2, 2, 1, 1]])

In [3]: mode(a)
Out[3]: (array([1, 3, 2, 2, 1, 1]), array([1, 2, 2, 2, 1, 2]))

Some benchmarks:

In [4]: import scipy.stats

In [5]: a = numpy.random.randint(1,10,(1000,1000))

In [6]: %timeit scipy.stats.mode(a)
10 loops, best of 3: 41.6 ms per loop

In [7]: %timeit mode(a)
10 loops, best of 3: 46.7 ms per loop

In [8]: a = numpy.random.randint(1,500,(1000,1000))

In [9]: %timeit scipy.stats.mode(a)
1 loops, best of 3: 1.01 s per loop

In [10]: %timeit mode(a)
10 loops, best of 3: 80 ms per loop

In [11]: a = numpy.random.random((200,200))

In [12]: %timeit scipy.stats.mode(a)
1 loops, best of 3: 3.26 s per loop

In [13]: %timeit mode(a)
1000 loops, best of 3: 1.75 ms per loop

EDIT: Provided more of a background and modified the approach to be more memory-efficient


from collections import Counter

n = int(input())
data = sorted([int(i) for i in input().split()])

sorted(sorted(Counter(data).items()), key = lambda x: x[1], reverse = True)[0][0]

print(Mean)

The Counter(data) counts the frequency and returns a defaultdict. sorted(Counter(data).items()) sorts using the keys, not the frequency. Finally, need to sorted the frequency using another sorted with key = lambda x: x[1]. The reverse tells Python to sort the frequency from the largest to the smallest.


A neat solution that only uses numpy (not scipy nor the Counter class):

A = np.array([[1,3,4,2,2,7], [5,2,2,1,4,1], [3,3,2,2,1,1]])

np.apply_along_axis(lambda x: np.bincount(x).argmax(), axis=0, arr=A)

array([1, 3, 2, 2, 1, 1])


If you want to use numpy only:

x = [-1, 2, 1, 3, 3]
vals,counts = np.unique(x, return_counts=True)

gives

(array([-1,  1,  2,  3]), array([1, 1, 1, 2]))

And extract it:

index = np.argmax(counts)
return vals[index]

I think a very simple way would be to use the Counter class. You can then use the most_common() function of the Counter instance as mentioned here.

For 1-d arrays:

import numpy as np
from collections import Counter

nparr = np.arange(10) 
nparr[2] = 6 
nparr[3] = 6 #6 is now the mode
mode = Counter(nparr).most_common(1)
# mode will be [(6,3)] to give the count of the most occurring value, so ->
print(mode[0][0])    

For multiple dimensional arrays (little difference):

import numpy as np
from collections import Counter

nparr = np.arange(10) 
nparr[2] = 6 
nparr[3] = 6 
nparr = nparr.reshape((10,2,5))     #same thing but we add this to reshape into ndarray
mode = Counter(nparr.flatten()).most_common(1)  # just use .flatten() method

# mode will be [(6,3)] to give the count of the most occurring value, so ->
print(mode[0][0])

This may or may not be an efficient implementation, but it is convenient.


if you want to find mode as int Value here is the easiest way I was trying to find out mode of Array using Scipy Stats but the problem is that output of the code look like:

ModeResult(mode=array(2), count=array([[1, 2, 2, 2, 1, 2]])) , I only want the Integer output so if you want the same just try this

import numpy as np
from scipy import stats
numbers = list(map(int, input().split())) 
print(int(stats.mode(numbers)[0]))

Last line is enough to print Mode Value in Python: print(int(stats.mode(numbers)[0]))


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to numpy

Unable to allocate array with shape and data type How to fix 'Object arrays cannot be loaded when allow_pickle=False' for imdb.load_data() function? Numpy, multiply array with scalar TypeError: only integer scalar arrays can be converted to a scalar index with 1D numpy indices array Could not install packages due to a "Environment error :[error 13]: permission denied : 'usr/local/bin/f2py'" Pytorch tensor to numpy array Numpy Resize/Rescale Image what does numpy ndarray shape do? How to round a numpy array? numpy array TypeError: only integer scalar arrays can be converted to a scalar index

Examples related to 2d

Declare an empty two-dimensional array in Javascript? Most efficient way to find mode in numpy array 2D cross-platform game engine for Android and iOS? A simple algorithm for polygon intersection Difference between SurfaceView and View? Drawing Isometric game worlds Calculating a 2D Vector's Cross Product

Examples related to mode

Most efficient way to find mode in numpy array Write a mode method in Java to find the most frequently occurring element in an array GroupBy pandas DataFrame and select most common value Finding the mode of a list Shortcut to exit scale mode in VirtualBox Get the element with the highest occurrence in an array