Find the most frequent number in a NumPy array

Question

Suppose I have the following NumPy array  a   np array  1 2 3 1 2 1 1 1 3 2 2 1    How can I find the most frequent number in this array

User · Answer

Expanding on this method  applied to finding the mode of the data where you may need the index of the actual array to see how far away the value is from the center of the distribution        idx  counts    np unique a  return index True  return counts True  index   idx np argmax counts   mode   a index    Remember to discard the mode when len np argmax counts     1

User · Answer

Here is a general solution that may be applied along an axis  regardless of values  using purely numpy   I ve also found that this is much faster than scipy stats mode if there are a lot of unique values   import numpy  def mode ndarray  axis 0         Check inputs     ndarray   numpy asarray ndarray      ndim   ndarray ndim     if ndarray size    1          return  ndarray 0   1      elif ndarray size    0          raise Exception  Cannot compute mode on empty array       try          axis   range ndarray ndim  axis      except          raise Exception  Axis      incompatible with the   -dimension array  format axis  ndim          If array is 1-D and numpy version is  gt  1 9 numpy unique will suffice     if all  ndim    1              int numpy   version   split      0    gt   1              int numpy   version   split      1    gt   9            modals  counts   numpy unique ndarray  return counts True          index   numpy argmax counts          return modals index   counts index         Sort array     sort   numpy sort ndarray  axis axis        Create array to transpose along the axis and get padding shape     transpose   numpy roll numpy arange ndim    -1   axis      shape   list sort shape      shape axis    1       Create a boolean array along strides of unique values     strides   numpy concatenate  numpy zeros shape shape  dtype  bool                                     numpy diff sort  axis axis     0                                   numpy zeros shape shape  dtype  bool                                     axis axis  transpose transpose  ravel         Count the stride lengths     counts   numpy cumsum strides      counts  strides    numpy concatenate   0   numpy diff counts  strides         counts strides    0       Get shape of padded counts and slice to return to the original shape     shape   numpy array sort shape      shape axis     1     shape   shape transpose      slices    slice None     ndim     slices axis    slice 1  None        Reshape and compute final counts     counts   counts reshape shape  transpose transpose  slices    1        Find maximum counts and return modals counts     slices    slice None  i  for i in sort shape      del slices axis      index   numpy ogrid slices      index insert axis  numpy argmax counts  axis axis       return sort index   counts index

User · Answer

I m recently doing a project and using collections Counter  Which tortured me    The Counter in collections have a very very bad performance in my opinion  It s just a class wrapping dict     What s worse   If you use cProfile to profile its method  you should see a lot of    missing    and    instancecheck    stuff wasting the whole time   Be careful using its most common    because everytime it would invoke a sort which makes it extremely slow  and if you use most common x   it will invoke a heap sort  which is also slow   Btw  numpy s bincount also have a problem  if you use np bincount  1 2 4000000    you will get an array with 4000000 elements

User · Answer

If your list contains all non-negative ints  you should take a look at numpy bincounts  http   docs scipy org doc numpy reference generated numpy bincount html and then probably use np argmax  a   np array  1 2 3 1 2 1 1 1 3 2 2 1   counts   np bincount a  print np argmax counts    For a more complicated list  that perhaps contains negative numbers or non-integer values   you can use np histogram in a similar way  Alternatively  if you just want to work in python without using numpy  collections Counter is a good way of handling this sort of data  from collections import Counter a    1 2 3 1 2 1 1 1 3 2 2 1  b   Counter a  print b most common 1

User · Answer

In Python 3 the following should work   max set a   key lambda x  a count x

User · Answer

Starting in Python 3 4  the standard library includes the statistics mode function to return the single most common data point   from statistics import mode  mode  1  2  3  1  2  1  1  1  3  2  2  1     1   If there are multiple modes with the same frequency  statistics mode returns the first one encountered     Starting in Python 3 8  the statistics multimode function returns a list of the most frequently occurring values in the order they were first encountered   from statistics import multimode  multimode  1  2  3  1  2      1  2

User · Answer

You can use the following approach  x   np array   2  5  5  2    2  7  8  5    2  5  7  9    u  c   np unique x  return counts True  print u c    np amax c     This will give the answer  array  2  5

User · Answer

While most of the answers above are useful  in case you  1  need it to support non-positive-integer values  e g  floats or negative integers  -    and 2  aren t on Python 2 7  which collections Counter requires   and 3  prefer not to add the dependency of scipy  or even numpy  to your code  then a purely python 2 6 solution that is O nlogn   i e   efficient  is just this   from collections import defaultdict  a    1 2 3 1 2 1 1 1 3 2 2 1   d   defaultdict int  for i in a    d i     1 most frequent   sorted d iteritems    key lambda x  x 1   reverse True  0

User · Answer

Performances  using iPython  for some solutions found here    gt  gt  gt    small array  gt  gt  gt  a    12 3 65 33 12 3 123 888000   gt  gt  gt    gt  gt  gt  import collections  gt  gt  gt  collections Counter a  most common   0  0  3  gt  gt  gt   timeit collections Counter a  most common   0  0  100000 loops  best of 3  11 3   s per loop  gt  gt  gt    gt  gt  gt  import numpy  gt  gt  gt  numpy bincount a  argmax   3  gt  gt  gt   timeit numpy bincount a  argmax   100 loops  best of 3  2 84 ms per loop  gt  gt  gt    gt  gt  gt  import scipy stats  gt  gt  gt  scipy stats mode a  0  0  3 0  gt  gt  gt   timeit scipy stats mode a  0  0  10000 loops  best of 3  172   s per loop  gt  gt  gt    gt  gt  gt  from collections import defaultdict  gt  gt  gt  def jjc l           d   defaultdict int          for i in a              d i     1         return sorted d iteritems    key lambda x  x 1   reverse True  0        gt  gt  gt  jjc a  0  3  gt  gt  gt   timeit jjc a  0  100000 loops  best of 3  5 58   s per loop  gt  gt  gt    gt  gt  gt  max map lambda val   a count val   val   set a    1  12  gt  gt  gt   timeit max map lambda val   a count val   val   set a    1  100000 loops  best of 3  4 11   s per loop  gt  gt  gt     Best is  max  with  set  for small arrays like the problem   According to  David Sanders  if you increase the array size to something like 100 000 elements  the  max w set  algorithm ends up being the worst by far whereas the  numpy bincount  method is the best

User · Answer

If you re willing to use SciPy    gt  gt  gt  from scipy stats import mode  gt  gt  gt  mode  1 2 3 1 2 1 1 1 3 2 2 1    array   1     array   6      gt  gt  gt  most frequent   mode  1 2 3 1 2 1 1 1 3 2 2 1   0  0   gt  gt  gt  most frequent 1 0

User · Answer

I like the solution by JoshAdel   But there is just one catch    The np bincount   solution only works on numbers   If you have strings  collections Counter solution will work for you

User · Answer

You may use values  counts   np unique a  return counts True   ind   np argmax counts  print values ind      prints the most frequent element  ind   np argpartition -counts  kth 10   10  print values ind      prints the 10 most frequent elements  If some element is as frequent as another one  this code will return only the first element

User · Answer

Also if you want to get most frequent value positive or negative  without loading any modules you can use the following code   lVals    1 2 3 1 2 1 1 1 3 2 2 1  print max map lambda val   lVals count val   val   set lVals

[python] Find the most frequent number in a NumPy array

Examples related to python

Examples related to numpy