Fastest way to find second third highest lowest value in vector or column

Question

R offers max and min  but I do not see a really fast way to find another value in the order  apart from sorting the whole vector and then picking a value x from this vector  Is there a faster way to get the second highest value  for example

User · Answer

I wrapped Rob s answer up into a slightly more general function  which can be used to find the 2nd  3rd  4th  etc   max   maxN  lt - function x  N 2     len  lt - length x    if N gt len       warning  N greater than length x    Setting N length x        N  lt - length x        sort x partial len-N 1  len-N 1     maxN 1 10

User · Answer

Slightly slower alternative  just for the records   x  lt - c 12 45 34 4 0 -234 45 6 4  max  x x  max x     min  x x  min x

User · Answer

topn   function vector  n     maxs c     ind c     for  i in 1 n       biggest match max vector   vector      ind i  biggest     maxs i  max vector      vector vector -biggest        mat cbind maxs  ind    return mat      this function will return a matrix with the top n values and their indices   hope it helps VDevi-Chou

User · Answer

This will find the index of the N th smallest or largest value in the input numeric vector x   Set bottom TRUE in the arguments if you want the N th from the bottom  or bottom FALSE if you want the N th from the top   N 1 and bottom TRUE is equivalent to which min  N 1 and bottom FALSE is equivalent to which max   FindIndicesBottomTopN  lt - function x c 4 -2 5 -77 99  N 1 bottom FALSE       k1  lt - rank x    if bottom  TRUE       Nindex  lt - which k1  N      Nindex  lt - Nindex 1         if bottom  FALSE       Nindex  lt - which k1   length x  1-N       Nindex  lt - Nindex 1         return Nindex

User · Answer

I found that removing the max element first and then do another max runs in comparable speed   system time  a runif 1000000  m max a  i which max a  b a -i  max b       user  system elapsed    0 092   0 000   0 659   system time  a runif 1000000  n length a  sort a partial n-1  n-1       user  system elapsed    0 096   0 000   0 653

User · Answer

When I was recently looking for an R function returning indexes of top N max min numbers in a given vector  I was surprised there is no such a function    And this is something very similar    The brute force solution using base  order function seems to be the easiest one   topMaxUsingFullSort  lt - function x  N      sort x  decreasing   TRUE  1 min N  length x        But it is not the fastest one in case your N value is relatively small compared to length of the vector x   On the other side if the N is really small  you can use base  whichMax function iteratively and in each iteration you can replace found value by -Inf    the input vector  x  must not contain -Inf value  topMaxUsingWhichMax  lt - function x  N      vals  lt - c     for i in 1 min N  length x          idx       lt - which max x      vals      lt - c vals  x idx     copy-on-modify  this is not an issue because idxs is relative small vector      x idx     lt - -Inf              copy-on-modify  this is the issue because data vector could be huge        vals     I believe you see the problem - the copy-on-modify nature of R  So this will perform better for very very very small N  1 2 3  but it will rapidly slow down for larger N values  And you are iterating over all elements in vector x N times   I think the best solution in clean R is to use partial base  sort   topMaxUsingPartialSort  lt - function x  N      N  lt - min N  length x     x x  gt   -sort -x  partial N  N   1 N      Then you can select the last  Nth  item from the result of functions defiend above   Note  functions defined above are just examples - if you want to use them  you have to check sanity inputs  eg  N   length x     I wrote a small article about something very similar  get indexes of top N max min values of a vector  at http   palusga cz  p 18 - you can find here some benchmarks of similar functions I defined above

User · Answer

dplyr has the function nth  where the first argument is the vector and the second is which place you want  This goes for repeating elements as well   For example   x   c 1 2  8  16  17  20  1  20    Finding the second largest value    nth unique x  length unique x  -1    1  17

User · Answer

Here is the simplest way I found   num  lt - c 5665 1615 5154 65564 69895646   num  lt - sort num  decreasing   F   tail num  1                              Highest number head tail num  2  1                      Second Highest number head tail num  3  1                      Third Highest number head tail num  n  1                      Generl equation for finding nth Highest number

User · Answer

Rfast has a function called nth element that does exactly what you ask and is faster than all of the implementations discussed above Also the methods discussed above that are based on partial sort  don t support finding the k smallest values Disclaimer  An issue appears to occur when dealing with integers which can by bypassed by using as numeric  e g  Rfast  nth as numeric 1 10   2    and will be addressed in the next update of Rfast  Rfast  nth x  5  descending   T   Will return the 5th largest element of x  while Rfast  nth x  5  descending   F   Will return the 5th smallest element of x Benchmarks below against most popular answers  For 10 thousand numbers  N   10000 x   rnorm N   maxN  lt - function x  N 2       len  lt - length x      if N gt len           warning  N greater than length x    Setting N length x            N  lt - length x            sort x partial len-N 1  len-N 1     microbenchmark  microbenchmark  Rfast   Rfast  nth x 5 descending   T   maxn   maxN x 5   order   x order x  decreasing   T  5     Unit  microseconds   expr      min       lq      mean   median        uq       max neval  Rfast  160 364  179 607  202 8024  194 575  210 1830   351 517   100   maxN  396 419  423 360  559 2707  446 452  487 0775  4949 452   100  order 1288 466 1343 417 1746 7627 1433 221 1500 7865 13768 148   100  For 1 million numbers  N   1e6 x   rnorm N   microbenchmark  microbenchmark  Rfast   Rfast  nth x 5 descending   T   maxN   maxN x 5   order   x order x  decreasing   T  5      Unit  milliseconds   expr      min        lq      mean   median        uq       max neval  Rfast  89 7722  93 63674  114 9893 104 6325  120 5767  204 8839   100   maxN 150 2822 207 03922  235 3037 241 7604  259 7476  336 7051   100  order 930 8924 968 54785 1005 5487 991 7995 1031 0290 1164 9129   100

User · Answer

You can use the sort keyword like this   sort unique c   1 N    Example    c  lt - c 4 2 44 2 1 45 34 2 4 22 244  sort unique c   decreasing   TRUE  1 5    will give the first 5 max numbers

User · Answer

You can identify the next higher value with cummax    If you want the location of the each new higher value for example you can pass your vector of cummax   values to the diff   function to identify locations at which the cummax   value changed  say we have the vector   v  lt - c 4 6 3 2 -5 6 8 12 16  cummax v  will give us the vector 4  6  6  6  6  6  8 12 16   Now  if you want to find the location of a change in cummax   you have many options I tend to use sign diff cummax v     You have to adjust for the lost first element because of diff    The complete code for vector v would be   which sign diff cummax v     1  1

User · Answer

Here you go    kit is the obvious winner  N   1e6 x   rnorm N   maxN  lt - function x  N 2     len  lt - length x    if N gt len       warning  N greater than length x    Setting N length x        N  lt - length x        sort x partial len-N 1  len-N 1     microbenchmark  microbenchmark    Rfast   Rfast  nth x 5 descending   T     maxN   maxN x 5     order   x order x  decreasing   T  5      kit   x kit  topn x  5L decreasing   T  5L        Unit  milliseconds   expr       min        lq     mean    median        uq        max neval   Rfast 12 311168 12 473771 16 36982 12 702134 16 110779 102 749873   100   maxN  12 922118 13 124358 17 49628 18 977537 20 053139  28 928694   100   order 50 443100 50 926975 52 54067 51 270163 52 323116  66 561606   100   kit    1 177202  1 216371  1 29542  1 240228  1 297286   2 771715   100  Edit  I forgot that kit  topn has hasna option   let s do another run  microbenchmark  microbenchmark    Rfast   Rfast  nth x 5 descending   T     maxN   maxN x 5     order   x order x  decreasing   T  5      kit   x kit  topn x  5L decreasing   T  5L      kit2   x kit  topn x  5L decreasing   T hasna   F  5L      unit    quot ms quot       Unit  milliseconds   expr       min        lq       mean     median        uq       max neval   Rfast 13 194314 13 358787 14 7227116 13 4560340 14 551194 24 524105   100   maxN   7 378960  7 527661 10 0747803  7 7119715 12 217756 67 409526   100   order 50 088927 50 488832 52 4714347 50 7415680 52 267003 70 062662   100   kit    1 180698  1 217237  1 2975441  1 2429790  1 278243  3 263202   100   kit2   0 842354  0 876329  0 9398055  0 9109095  0 944407  2 135903   100

User · Answer

head sort x      or tail sort x       should work

User · Answer

Here is an easy way to find the indices of N smallest largest values in a vector Example for N   3    N  lt - 3   N Smallest   ndx  lt - order x  1 N    N Largest   ndx  lt - order x  decreasing   T  1 N    So you can extract the values as   x ndx

User · Answer

Use the partial argument of sort    For the second highest value   n  lt - length x  sort x partial n-1  n-1

User · Answer

For nth highest value   sort x  TRUE  n

[r] Fastest way to find second (third...) highest/lowest value in vector or column

Examples related to r

Examples related to vector