How to find the statistical mode

Question

In R  mean   and median   are standard functions which do what you d expect   mode   tells you the internal storage mode of the object  not the value that occurs the most in its argument  But is there is a standard library function that implements the statistical mode for a vector  or list

User · Answer

I found Ken Williams post above to be great  I added a few lines to account for NA values and made it a function for ease    Mode  lt - function x  na rm   FALSE      if na rm       x   x  is na x          ux  lt - unique x    return ux which max tabulate match x  ux

User · Answer

Here is a function to find the mode   mode  lt - function x      unique val  lt - unique x    counts  lt - vector     for  i in 1 length unique val         counts i   lt - length which x  unique val i          position  lt - c which counts  max counts      if  mean counts   max counts        mode x  lt -  Mode does not exist    else      mode x  lt - unique val position    return mode x

User · Answer

Based on  Chris s function to calculate the mode or related metrics  however using Ken Williams s method to calculate frequencies  This one provides a fix for the case of no modes at all  all elements equally frequent   and some more readable method names    Mode  lt - function x  method    one   na rm   FALSE      x  lt - unlist x    if  na rm        x  lt - x  is na x            Get unique values   ux  lt - unique x    n  lt - length ux       Get frequencies of all unique values   frequencies  lt - tabulate match x  ux     modes  lt - frequencies    max frequencies       Determine number of modes   nmodes  lt - sum modes    nmodes  lt - ifelse nmodes  n  0L  nmodes     if  method  in  c  one    mode         is na method           Return NA if not exactly one mode  else return the mode     if  nmodes    1          return NA        else         return ux which modes              else if  method  in  c  n    nmodes            Return the number of modes     return nmodes      else if  method  in  c  all    modes            Return NA if no modes exist  else return all modes     if  nmodes  gt  0          return ux which modes          else         return NA              warning  Warning  method not recognised   Valid methods are  one   mode   default    n   nmodes  and  all   modes        Since it uses Ken s method to calculate frequencies the performance is also optimised  using AkselA s post I benchmarked some of the previous answers as to show how my function is close to Ken s in performance  with the conditionals for the various ouput options causing only minor overhead

User · Answer

Another possible solution   Mode  lt - function x        if  is numeric x             x table  lt - table x          return as numeric names x table  which max x table               Usage   set seed 100  v  lt - sample x   1 100  size   1000000  replace   TRUE  system time Mode v     Output      user  system elapsed     0 32    0 00    0 31

User · Answer

Could try the following function    transform numeric values into factor use summary   to gain the frequency table return mode the index whose frequency is the largest transform factor back to numeric even there are more than 1 mode  this function works well     mode  lt - function x     y  lt - as factor x    freq  lt - summary y    mode  lt - names freq  freq names freq      max freq     as numeric mode

User · Answer

Here  another solution   freq  lt - tapply mySamples mySamples length   or freq  lt - table mySamples  as numeric names freq  which max freq

User · Answer

I ve written the following code in order to generate the mode   MODE  lt - function dataframe       DF  lt - as data frame dataframe       MODE2  lt - function x                 if  is numeric x     FALSE               df  lt - as data frame table x                 df  lt - df order df Freq                          m  lt - max df Freq                      MODE1  lt - as vector as character subset df  Freq    m    1                 if  sum df Freq  length df Freq   1                   warning  No Mode  Frequency of all values is 1   call    FALSE               else                  return MODE1                          else               df  lt - as data frame table x                 df  lt - df order df Freq                          m  lt - max df Freq                      MODE1  lt - as vector as numeric as character subset df  Freq    m    1                  if  sum df Freq  length df Freq   1                   warning  No Mode  Frequency of all values is 1   call    FALSE               else                  return MODE1                                     return as vector lapply DF  MODE2        Let s try it   MODE mtcars  MODE CO2  MODE ToothGrowth  MODE InsectSprays

User · Answer

I case your observations are classes from Real numbers and you expect that the mode to be 2 5 when your observations are 2  2  3  and 3 then you could estimate the mode with mode   l1   i    f1-f0     2f1 - f0 - f2  where l1  lower limit of most frequent class  f1  frequency of most frequent class  f0  frequency of classes before most frequent class  f2  frequency of classes after most frequent class and i  Class interval as given e g  in 1  2  3    Small Example x  lt - c 2 2 3 3   Observations i  lt - 1           Class interval  z  lt - hist x  breaks   seq min x -1 5 i  max x  1 5 i  i   plot F   Calculate frequency of classes mf  lt - which max z counts     index of most frequent class zc  lt - z counts z breaks mf    i    zc mf  - zc mf-1      2 zc mf  - zc mf-1  - zc mf 1     gives you the mode of 2 5    Larger Example set seed 0  i  lt - 5           Class interval x  lt - round rnorm 100 mean 100 sd 10  i  i  Observations  z  lt - hist x  breaks   seq min x -1 5 i  max x  1 5 i  i   plot F  mf  lt - which max z counts  zc  lt - z counts z breaks mf    i    zc mf  - zc mf-1      2 zc mf  - zc mf-1  - zc mf 1     gives you the mode of 99 5   In case you want the most frequent level and you have more than one most frequent level you can get all of them e g  with   x  lt - c 2 2 3 5 5  names which max table x    table x      2   5

User · Answer

Here is my data table solution that returns row-wise modes for a complete table  I use it to infer row class  It takes care of the new-ish set   function in data table and should be pretty fast  It does not manage NA though but that could be added by looking at the numerous other solutions on this page  majorityVote  lt - function mat classes       mat classes   dt pour centroids num   dt modes  lt - data table mode   integer nrow mat classes      for  i in 1 nrow mat classes         cur row  lt - mat classes i      cur mode  lt - which max table t cur row        set dt modes  i i  j  quot mode quot   value   cur mode         return dt modes     Possible usage  newClass  lt - majorityVote my dt     just a new vector with all the modes

User · Answer

Below is the code which can be use to find the mode of a vector variable in R   a  lt - table  vector    names a a  max a

User · Answer

The generic function fmode in the collapse package now available on CRAN implements a C   based mode based on index hashing  It is significantly faster than any of the above approaches  It comes with methods for vectors  matrices  data frames and dplyr grouped tibbles  Syntax   fmode x  g   NULL  w   NULL         where x can be one of the above objects  g supplies an optional grouping vector or list of grouping vectors  for grouped mode calculations  also performed in C     and w  optionally  supplies a numeric weight vector  In the grouped tibble method  there is no g argument  you can do data   gt   group by idvar    gt   fmode

User · Answer

Adding in raster  modal   as an option  although note that raster is a hefty package and may not be worth installing if you don t do geospatial work    The source code could be pulled out of https   github com rspatial raster blob master src modal cpp and https   github com rspatial raster blob master R modal R into a personal R package  for those who are particularly keen

User · Answer

This works pretty fine   gt  a lt -c 1 1 2 2 3 3 4 4 5   gt  names table a   table a   max table a

User · Answer

If you ask the built-in function in R  maybe you can find it on package pracma  Inside of that package  there is a function called  Mode

User · Answer

Calculating Mode is mostly in case of factor variable then we can use   labels table HouseVotes84 V1  as numeric labels max table HouseVotes84 V1         HouseVotes84 is dataset available in  mlbench  package   it will give max label value  it is easier to use by inbuilt functions itself without writing function

User · Answer

R has so many add-on packages that some of them may well provide the  statistical  mode of a numeric list series vector   However the standard library of R itself doesn t seem to have such a built-in method   One way to work around this is to use some construct like the following  and to turn this to a function if you use often       mySamples  lt - c 19  4  5  7  29  19  29  13  25  19  tabSmpl lt -tabulate mySamples  SmplMode lt -which tabSmpl   max tabSmpl   if sum tabSmpl    max tabSmpl   gt 1  SmplMode lt -NA  gt  SmplMode  1  19   For bigger sample list  one should consider using a temporary variable for the max tabSmpl  value   I don t know that R would automatically optimize this   Reference  see  How about median and mode   in this KickStarting R lesson This seems to confirm that  at least as of the writing of this lesson  there isn t a mode function in R   well     mode   as you found out is used for asserting the type of variables

User · Answer

Here are several ways you can do it in Theta N  running time  from collections import defaultdict  def mode1 L       counts   defaultdict int      for v in L          counts v     1     return max counts key lambda x counts x      def mode2 L       vals   set L      return max vals key lambda x  L count x     def mode3 L       return max set L   key lambda x  L count x

User · Answer

There is package modeest which provide estimators of the mode of univariate unimodal  and sometimes multimodal  data and values of the modes of usual probability distributions   mySamples  lt - c 19  4  5  7  29  19  29  13  25  19   library modeest  mlv mySamples  method    mfv    Mode  most likely value   19  Bickel s modal skewness  -0 1  Call  mlv default x   mySamples  method    mfv     For more information see this page

User · Answer

There are multiple solutions provided for this one  I checked the first one and after that wrote my own  Posting it here if it helps anyone   Mode  lt - function x     y  lt - data frame table x     y y Freq    max y Freq  1      Lets test it with a few example  I am taking the iris data set  Lets test with numeric data   gt  Mode iris Sepal Length   1  5   which you can verify is correct   Now the only non numeric field in the iris dataset Species  does not have a mode  Let s test with our own example   gt  test  lt - c  red   red   green   blue   red    gt  Mode test   1  red   EDIT  As mentioned in the comments  user might want to preserve the input type  In which case the mode function can be modified to   Mode  lt - function x     y  lt - data frame table x     z  lt - y y Freq    max y Freq  1    as as character z  class x       The last line of the function simply coerces the final mode value to the type of the original input

User · Answer

You could also calculate the number of times an instance has happened in your set and find the max number  e g    gt  temp  lt - table as vector x    gt  names  temp  temp  max temp    1   1   gt  as data frame table x   r5050 Freq 1     0   13 2     1   15 3     2    6  gt

User · Answer

While I like Ken Williams simple function  I would like to retrieve the multiple modes if they exist   With that in mind  I use the following function which returns a list of the modes if multiple or the single   rmode  lt - function x      x  lt - sort x      u  lt - unique x    y  lt - lapply u  function y  length x x  y      u which  unlist y     max unlist y

User · Answer

It seems to me that if a collection has a mode  then its elements can be mapped one-to-one with the natural numbers  So  the problem of finding the mode reduces to producing such a mapping  finding the mode of the mapped values  then mapping back to some of the items in the collection   Dealing with NA occurs at the mapping phase    I have a histogram function that operates on a similar principal   The special functions and operators used in the code presented herein should be defined in Shapiro and or the neatOveRse  The portions of Shapiro and neatOveRse duplicated herein are so duplicated with permission  the duplicated snippets may be used under the terms of this site   R pseudocode for histogram is   histogram  lt - function  i          if  i     is empty  integer   else         vapply2 i     max     seqN         lt    i  O  sum   histogram  lt - function i  i     rmna      histogram    The special binary operators accomplish piping  currying  and composition  I also have a maxloc function  which is similar to which max  but returns all the absolute maxima of a vector  R pseudocode for maxloc is  FUNloc  lt - function  FUN  x  na rm F          which x    list identity  rmna   na rm     index b   x      FUN   maxloc  lt - FUNloc   lt    max  minloc  lt - FUNloc   lt    min   I M THROWING IN minloc TO EXPLAIN WHY I MADE FUNloc   Then  imode  lt - histogram  O  maxloc   and  x     map     imode     unmap   will compute the mode of any collection  provided appropriate map-ping and unmap-ping functions are defined

User · Answer

A quick and dirty way of estimating the mode of a vector of numbers you believe come from a continous univariate distribution  e g  a normal distribution  is defining and using the following function   estimate mode  lt - function x      d  lt - density x    d x which max d y       Then to get the mode estimate   x  lt - c 5 8  5 6  6 2  4 1  4 9  2 4  3 9  1 8  5 7  3 2  estimate mode x     5 439788

User · Answer

I would use the density   function to identify a smoothed maximum of a  possibly continuous  distribution    function x  density x  2  x density x  2  y    max density x  2  y     where x is the data collection  Pay attention to the adjust paremeter of the density function which regulate the smoothing

User · Answer

found this on the r mailing list  hope it s helpful  It is also what I was thinking anyways  You ll want to table   the data  sort and then pick the first name  It s hackish but should work   names sort -table x    1

User · Answer

One more solution  which works for both numeric  amp  character factor data   Mode  lt - function x      ux  lt - unique x    ux which max tabulate match x  ux         On my dinky little machine  that can generate  amp  find the mode of a 10M-integer vector in about half a second   If your data set might have multiple modes  the above solution takes the same approach as which max  and returns the first-appearing value of the set of modes   To return all modes  use this variant  from  digEmAll in the comments    Modes  lt - function x      ux  lt - unique x    tab  lt - tabulate match x  ux     ux tab    max tab

User · Answer

An easy way to calculate MODE of a vector  v  containing discrete values is   names sort table v    length sort table v

User · Answer

I can t vote yet but Rasmus B    th s answer is what I was looking for   However  I would modify it a bit allowing to contrain the distribution for example fro values only between 0 and 1    estimate mode  lt - function x from min x   to max x       d  lt - density x  from from  to to    d x which max d y       We aware that you may not want to constrain at all your distribution  then set from - BIG NUMBER   to  BIG NUMBER

User · Answer

This builds on jprockbelly s answer  by adding a speed up for very short vectors  This is useful when applying mode to a data frame or datatable with lots of small groups   Mode  lt - function x       if   length x   lt   2   return x 1      if   anyNA x    x   x  is na x      ux  lt - unique x     ux which max tabulate match x  ux

User · Answer

Sorry  I might take it too simple  but doesn t this do the job   in 1 3 secs for 1E6 values on my machine    t0  lt - Sys time   summary as factor round rnorm 1e6   2    1  Sys time  -t0   You just have to replace the  round rnorm 1e6  2   with your vector

User · Answer

Mode can t be useful in every situations  So the function should address this situation  Try the following function   Mode  lt - function v        checking unique numbers in the input   uniqv  lt - unique v      frquency of most occured value in the input data   m1  lt - max tabulate match v  uniqv      n  lt - length tabulate match v  uniqv        if all elements are same   same val check  lt - all diff v     0    if same val check    F         frquency of second most occured value in the input data     m2  lt - sort tabulate match v  uniqv   partial n-1  n-1      if  m1    m2            Returning the most repeated value       mode  lt - uniqv which max tabulate match v  uniqv           else        mode  lt -  Two or more values have same frequency  So mode can t be calculated             else         if all elements are same     mode  lt - unique v        return mode      Output   x1  lt - c 1 2 3 3 3 4 5  Mode x1     1  3  x2  lt - c 1 2 3 4 5  Mode x2     1   Two or more varibles have same frequency  So mode can t be calculated    x3  lt - c 1 1 2 3 3 4 5  Mode x3     1   Two or more values have same frequency  So mode can t be calculated

User · Answer

The following function comes in three forms   method    mode   default    calculates the mode for a unimodal vector  else returns an NA method    nmodes    calculates the number of modes in the vector method    modes    lists all the modes for a unimodal or polymodal vector  modeav  lt - function  x  method    mode   na rm   FALSE      x  lt - unlist x    if  na rm      x  lt - x  is na x     u  lt - unique x    n  lt - length u     get frequencies of each of the unique values in the vector   frequencies  lt - rep 0  n    for  i in seq len n         if  is na u i            frequencies i   lt - sum is na x             else         frequencies i   lt - sum x    u i   na rm   TRUE               mode if a unimodal vector  else NA   if  method     mode    is na method    method           return ifelse length frequencies frequencies  max frequencies    gt 1 NA u which max frequencies         number of modes   if method     nmode    method     nmodes      return length frequencies frequencies  max frequencies         list of all modes   if  method     modes    method     modevalues      return u which frequencies  max frequencies   arr ind   FALSE  useNames   FALSE          error trap the method   warning  Warning  method not recognised   Valid methods are  mode   default    nmodes  and  modes      return

User · Answer

A small modification to Ken Williams  answer  adding optional params na rm and return multiple   Unlike the answers relying on names    this answer maintains the data type of x in the returned value s    stat mode  lt - function x  return multiple   TRUE  na rm   FALSE      if na rm       x  lt - na omit x        ux  lt - unique x    freq  lt - tabulate match x  ux     mode loc  lt - if return multiple  which freq  max freq   else which max freq    return ux mode loc       To show it works with the optional params and maintains data type   foo  lt - c 2L  2L  3L  4L  4L  5L  NA  NA  bar  lt - c  mouse   mouse   dog   cat   cat   bird  NA NA   str stat mode foo     int  1 3  2 4 NA str stat mode bar     chr  1 3   mouse   cat  NA str stat mode bar  na rm T     chr  1 2   mouse   cat  str stat mode bar  return mult F  na rm T     chr  mouse    Thanks to  Frank for simplification

User · Answer

Another simple option that gives all values ordered by frequency is to use rle   df   as data frame unclass rle sort mySamples     df   df order -df lengths    head df

User · Answer

I was looking through all these options and started to wonder about their relative features and performances  so I did some tests  In case anyone else are curious about the same  I m sharing my results here   Not wanting to bother about all the functions posted here  I chose to focus on a sample based on a few criteria  the function should work on both character  factor  logical and numeric vectors  it should deal with NAs and other problematic values appropriately  and output should be  sensible   i e  no numerics as character or other such silliness   I also added a function of my own  which is based on the same rle idea as chrispy s  except adapted for more general use   library magrittr   Aksel  lt - function x  freq FALSE        z  lt - 2     if  freq  z  lt - 1 2     run  lt - x   gt   as vector   gt   sort   gt   rle   gt   unclass   gt   data frame     colnames run   lt - c  freq    value       run which run freq  max run freq    z    gt   as vector       set seed 2   F  lt - sample c  yes    no    maybe   NA   10  replace TRUE    gt   factor Aksel F      1  maybe yes    C  lt - sample c  Steve    Jane    Jonas    Petra    20  replace TRUE  Aksel C  freq TRUE     freq value      7 Steve   I ended up running five functions  on two sets of test data  through microbenchmark  The function names refer to their respective authors     Chris  function was set to method  modes  and na rm TRUE by default to make it more comparable  but other than that the functions were used as presented here by their authors   In matter of speed alone Kens version wins handily  but it is also the only one of these that will only report one mode  no matter how many there really are  As is often the case  there s a trade-off between speed and versatility  In method  mode   Chris  version will return a value iff there is one mode  else NA  I think that s a nice touch  I also think it s interesting how some of the functions are affected by an increased number of unique values  while others aren t nearly as much  I haven t studied the code in detail to figure out why that is  apart from eliminating logical numeric as a the cause

User · Answer

This hack should work fine  Gives you the value as well as the count of mode    Mode  lt - function x   a   table x    x is a vector return a which max a

[r] How to find the statistical mode?

Examples related to r

Examples related to statistics

Examples related to r-faq