[r] Fastest way to find second (third...) highest/lowest value in vector or column

R offers max and min, but I do not see a really fast way to find another value in the order, apart from sorting the whole vector and then picking a value x from this vector.

Is there a faster way to get the second highest value, for example?

This question is related to r vector

The answer is


This will find the index of the N'th smallest or largest value in the input numeric vector x. Set bottom=TRUE in the arguments if you want the N'th from the bottom, or bottom=FALSE if you want the N'th from the top. N=1 and bottom=TRUE is equivalent to which.min, N=1 and bottom=FALSE is equivalent to which.max.

FindIndicesBottomTopN <- function(x=c(4,-2,5,-77,99),N=1,bottom=FALSE)
{

  k1 <- rank(x)
  if(bottom==TRUE){
    Nindex <- which(k1==N)
    Nindex <- Nindex[1]
  }

  if(bottom==FALSE){
    Nindex <- which(k1==(length(x)+1-N))
    Nindex <- Nindex[1]
  }

  return(Nindex)
}

You can use the sort keyword like this:

sort(unique(c))[1:N]

Example:

c <- c(4,2,44,2,1,45,34,2,4,22,244)
sort(unique(c), decreasing = TRUE)[1:5]

will give the first 5 max numbers.


Slightly slower alternative, just for the records:

x <- c(12.45,34,4,0,-234,45.6,4)
max( x[x!=max(x)] )
min( x[x!=min(x)] )

When I was recently looking for an R function returning indexes of top N max/min numbers in a given vector, I was surprised there is no such a function.

And this is something very similar.

The brute force solution using base::order function seems to be the easiest one.

topMaxUsingFullSort <- function(x, N) {
  sort(x, decreasing = TRUE)[1:min(N, length(x))]
}

But it is not the fastest one in case your N value is relatively small compared to length of the vector x.

On the other side if the N is really small, you can use base::whichMax function iteratively and in each iteration you can replace found value by -Inf

# the input vector 'x' must not contain -Inf value 
topMaxUsingWhichMax <- function(x, N) {
  vals <- c()
  for(i in 1:min(N, length(x))) {
    idx      <- which.max(x)
    vals     <- c(vals, x[idx]) # copy-on-modify (this is not an issue because idxs is relative small vector)
    x[idx]   <- -Inf            # copy-on-modify (this is the issue because data vector could be huge)
  }
  vals
}

I believe you see the problem - the copy-on-modify nature of R. So this will perform better for very very very small N (1,2,3) but it will rapidly slow down for larger N values. And you are iterating over all elements in vector x N times.

I think the best solution in clean R is to use partial base::sort.

topMaxUsingPartialSort <- function(x, N) {
  N <- min(N, length(x))
  x[x >= -sort(-x, partial=N)[N]][1:N]
}

Then you can select the last (Nth) item from the result of functions defiend above.

Note: functions defined above are just examples - if you want to use them, you have to check/sanity inputs (eg. N > length(x)).

I wrote a small article about something very similar (get indexes of top N max/min values of a vector) at http://palusga.cz/?p=18 - you can find here some benchmarks of similar functions I defined above.


You can identify the next higher value with cummax(). If you want the location of the each new higher value for example you can pass your vector of cummax() values to the diff() function to identify locations at which the cummax() value changed. say we have the vector

v <- c(4,6,3,2,-5,6,8,12,16)
cummax(v) will give us the vector
4  6  6  6  6  6  8 12 16

Now, if you want to find the location of a change in cummax() you have many options I tend to use sign(diff(cummax(v))). You have to adjust for the lost first element because of diff(). The complete code for vector v would be:

which(sign(diff(cummax(v)))==1)+1

For nth highest value,

sort(x, TRUE)[n]

topn = function(vector, n){
  maxs=c()
  ind=c()
  for (i in 1:n){
    biggest=match(max(vector), vector)
    ind[i]=biggest
    maxs[i]=max(vector)
    vector=vector[-biggest]
  }
  mat=cbind(maxs, ind)
  return(mat)
}

this function will return a matrix with the top n values and their indices. hope it helps VDevi-Chou


I found that removing the max element first and then do another max runs in comparable speed:

system.time({a=runif(1000000);m=max(a);i=which.max(a);b=a[-i];max(b)})
   user  system elapsed 
  0.092   0.000   0.659 

system.time({a=runif(1000000);n=length(a);sort(a,partial=n-1)[n-1]})
   user  system elapsed 
  0.096   0.000   0.653 

Here is the simplest way I found,

num <- c(5665,1615,5154,65564,69895646)

num <- sort(num, decreasing = F)

tail(num, 1)                           # Highest number
head(tail(num, 2),1)                   # Second Highest number
head(tail(num, 3),1)                   # Third Highest number
head(tail(num, n),1)                   # Generl equation for finding nth Highest number

dplyr has the function nth, where the first argument is the vector and the second is which place you want. This goes for repeating elements as well. For example:

x = c(1,2, 8, 16, 17, 20, 1, 20)

Finding the second largest value:

 nth(unique(x),length(unique(x))-1)

[1] 17

Here you go... kit is the obvious winner!

N = 1e6
x = rnorm(N)

maxN <- function(x, N=2){
  len <- length(x)
  if(N>len){
    warning('N greater than length(x).  Setting N=length(x)')
    N <- length(x)
  }
  sort(x,partial=len-N+1)[len-N+1]
}

microbenchmark::microbenchmark(
  Rfast = Rfast::nth(x,5,descending = T),
  maxN = maxN(x,5),
  order = x[order(x, decreasing = T)[5]],
  kit = x[kit::topn(x, 5L,decreasing = T)[5L]]
) 
# Unit: milliseconds
# expr       min        lq     mean    median        uq        max neval
# Rfast 12.311168 12.473771 16.36982 12.702134 16.110779 102.749873   100
# maxN  12.922118 13.124358 17.49628 18.977537 20.053139  28.928694   100
# order 50.443100 50.926975 52.54067 51.270163 52.323116  66.561606   100
# kit    1.177202  1.216371  1.29542  1.240228  1.297286   2.771715   100

Edit: I forgot that kit::topn has hasna option...let's do another run.

microbenchmark::microbenchmark(
  Rfast = Rfast::nth(x,5,descending = T),
  maxN = maxN(x,5),
  order = x[order(x, decreasing = T)[5]],
  kit = x[kit::topn(x, 5L,decreasing = T)[5L]],
  kit2 = x[kit::topn(x, 5L,decreasing = T,hasna = F)[5L]],
  unit = "ms"
) 
# Unit: milliseconds
# expr       min        lq       mean     median        uq       max neval
# Rfast 13.194314 13.358787 14.7227116 13.4560340 14.551194 24.524105   100
# maxN   7.378960  7.527661 10.0747803  7.7119715 12.217756 67.409526   100
# order 50.088927 50.488832 52.4714347 50.7415680 52.267003 70.062662   100
# kit    1.180698  1.217237  1.2975441  1.2429790  1.278243  3.263202   100
# kit2   0.842354  0.876329  0.9398055  0.9109095  0.944407  2.135903   100

head(sort(x),..) or tail(sort(x),...) should work


Here is an easy way to find the indices of N smallest/largest values in a vector(Example for N = 3):

N <- 3

N Smallest:

ndx <- order(x)[1:N]

N Largest:

ndx <- order(x, decreasing = T)[1:N]

So you can extract the values as:

x[ndx]

Use the partial argument of sort(). For the second highest value:

n <- length(x)
sort(x,partial=n-1)[n-1]

I wrapped Rob's answer up into a slightly more general function, which can be used to find the 2nd, 3rd, 4th (etc.) max:

maxN <- function(x, N=2){
  len <- length(x)
  if(N>len){
    warning('N greater than length(x).  Setting N=length(x)')
    N <- length(x)
  }
  sort(x,partial=len-N+1)[len-N+1]
}

maxN(1:10)