Most efficient way to find mode in numpy array

Question

I have a 2D array containing integers  both positive or negative   Each row represents the values over time for a particular spatial site  whereas each column represents values for various spatial sites for a given time   So if the array is like   1 3 4 2 2 7 5 2 2 1 4 1 3 3 2 2 1 1   The result should be  1 3 2 2 2 1   Note that when there are multiple values for mode  any one  selected randomly  may be set as mode   I can iterate over the columns finding mode one at a time but I was hoping numpy might have some in-built function to do that  Or if there is a trick to find that efficiently without looping

User · Answer

If you want to use numpy only   x    -1  2  1  3  3  vals counts   np unique x  return counts True    gives   array  -1   1   2   3    array  1  1  1  2      And extract it   index   np argmax counts  return vals index

User · Answer

Expanding on this method  applied to finding the mode of the data where you may need the index of the actual array to see how far away the value is from the center of the distribution       idx  counts    np unique a  return index True  return counts True  index   idx np argmax counts   mode   a index    Remember to discard the mode when len np argmax counts     1  also to validate if it is actually representative of the central distribution of your data you may check whether it falls inside your standard deviation interval

User · Answer

Check scipy stats mode    inspired by  tom10 s comment    import numpy as np from scipy import stats  a   np array   1  3  4  2  2  7                  5  2  2  1  4  1                  3  3  2  2  1  1     m   stats mode a  print m    Output   ModeResult mode array   1  3  2  2  1  1     count array   1  2  2  2  1  2       As you can see  it returns both the mode as well as the counts  You can select the modes directly via m 0    print m 0     Output     1 3 2 2 1 1

User · Answer

from collections import Counter  n   int input    data   sorted  int i  for i in input   split      sorted sorted Counter data  items     key   lambda x  x 1   reverse   True  0  0   print Mean    The Counter data  counts the frequency and returns a defaultdict  sorted Counter data  items    sorts using the keys  not the frequency  Finally  need to sorted the frequency using another sorted with key   lambda x  x 1   The reverse tells Python to sort the frequency from the largest to the smallest

User · Answer

simplest way in Python to get the mode of an list or array a      import statistics    print  mode     str statistics  mode a      That s it

User · Answer

A neat solution that only uses numpy  not scipy nor the Counter class    A   np array   1 3 4 2 2 7    5 2 2 1 4 1    3 3 2 2 1 1     np apply along axis lambda x  np bincount x  argmax    axis 0  arr A       array  1  3  2  2  1  1

User · Answer

I think a very simple way would be to use the Counter class  You can then use the most common   function of the Counter instance as mentioned here   For 1-d arrays    import numpy as np from collections import Counter  nparr   np arange 10   nparr 2    6  nparr 3    6  6 is now the mode mode   Counter nparr  most common 1    mode will be   6 3   to give the count of the most occurring value  so - gt  print mode 0  0         For multiple dimensional arrays  little difference    import numpy as np from collections import Counter  nparr   np arange 10   nparr 2    6  nparr 3    6  nparr   nparr reshape  10 2 5        same thing but we add this to reshape into ndarray mode   Counter nparr flatten    most common 1     just use  flatten   method    mode will be   6 3   to give the count of the most occurring value  so - gt  print mode 0  0     This may or may not be an efficient implementation  but it is convenient

User · Answer

Update  The scipy stats mode function has been significantly optimized since this post  and would be the recommended method  Old answer  This is a tricky problem  since there is not much out there to calculate mode along an axis   The solution is straight forward for 1-D arrays  where numpy bincount is handy  along with numpy unique with the return counts arg as True   The most common n-dimensional function I see is scipy stats mode  although it is prohibitively slow- especially for large arrays with many unique values   As a solution  I ve developed this function  and use it heavily   import numpy  def mode ndarray  axis 0         Check inputs     ndarray   numpy asarray ndarray      ndim   ndarray ndim     if ndarray size    1          return  ndarray 0   1      elif ndarray size    0          raise Exception  Cannot compute mode on empty array       try          axis   range ndarray ndim  axis      except          raise Exception  Axis      incompatible with the   -dimension array  format axis  ndim          If array is 1-D and numpy version is  gt  1 9 numpy unique will suffice     if all  ndim    1              int numpy   version   split      0    gt   1              int numpy   version   split      1    gt   9            modals  counts   numpy unique ndarray  return counts True          index   numpy argmax counts          return modals index   counts index         Sort array     sort   numpy sort ndarray  axis axis        Create array to transpose along the axis and get padding shape     transpose   numpy roll numpy arange ndim    -1   axis      shape   list sort shape      shape axis    1       Create a boolean array along strides of unique values     strides   numpy concatenate  numpy zeros shape shape  dtype  bool                                     numpy diff sort  axis axis     0                                   numpy zeros shape shape  dtype  bool                                     axis axis  transpose transpose  ravel         Count the stride lengths     counts   numpy cumsum strides      counts  strides    numpy concatenate   0   numpy diff counts  strides         counts strides    0       Get shape of padded counts and slice to return to the original shape     shape   numpy array sort shape      shape axis     1     shape   shape transpose      slices    slice None     ndim     slices axis    slice 1  None        Reshape and compute final counts     counts   counts reshape shape  transpose transpose  slices    1        Find maximum counts and return modals counts     slices    slice None  i  for i in sort shape      del slices axis      index   numpy ogrid slices      index insert axis  numpy argmax counts  axis axis       return sort index   counts index    Result   In  2   a   numpy array   1  3  4  2  2  7                             5  2  2  1  4  1                             3  3  2  2  1  1     In  3   mode a  Out 3    array  1  3  2  2  1  1    array  1  2  2  2  1  2      Some benchmarks   In  4   import scipy stats  In  5   a   numpy random randint 1 10  1000 1000    In  6    timeit scipy stats mode a  10 loops  best of 3  41 6 ms per loop  In  7    timeit mode a  10 loops  best of 3  46 7 ms per loop  In  8   a   numpy random randint 1 500  1000 1000    In  9    timeit scipy stats mode a  1 loops  best of 3  1 01 s per loop  In  10    timeit mode a  10 loops  best of 3  80 ms per loop  In  11   a   numpy random random  200 200    In  12    timeit scipy stats mode a  1 loops  best of 3  3 26 s per loop  In  13    timeit mode a  1000 loops  best of 3  1 75 ms per loop   EDIT  Provided more of a background and modified the approach to be more memory-efficient

User · Answer

if you want to find mode as int Value here is the easiest way I was trying to find out mode of Array using Scipy Stats but the problem is that output of the code look like  ModeResult mode array 2   count array   1  2  2  2  1  2       I only want the Integer output so if you want the same just try this import numpy as np from scipy import stats numbers   list map int  input   split      print int stats mode numbers  0     Last line is enough to print Mode Value in Python  print int stats mode numbers  0

[python] Most efficient way to find mode in numpy array

Examples related to python

Examples related to numpy

Examples related to 2d

Examples related to mode