[r] How to find the highest value of a column in a data frame in R?

I have the following data frame which I called ozone:

   Ozone Solar.R Wind Temp Month Day
1     41     190  7.4   67     5   1
2     36     118  8.0   72     5   2
3     12     149 12.6   74     5   3
4     18     313 11.5   62     5   4
5     NA      NA 14.3   56     5   5
6     28      NA 14.9   66     5   6
7     23     299  8.6   65     5   7
8     19      99 13.8   59     5   8
9      8      19 20.1   61     5   9

I would like to extract the highest value from ozone, Solar.R, Wind...

Also, if possible how would I sort Solar.R or any column of this data frame in descending order

I tried

max(ozone, na.rm=T)

which gives me the highest value in the dataset.

I have also tried

max(subset(ozone,Ozone))

but got "subset" must be logical."

I can set an object to hold the subset of each column, by the following commands

ozone <- subset(ozone, Ozone >0)
max(ozone,na.rm=T) 

but it gives the same value of 334, which is the max value of the data frame, not the column.

Any help would be great, thanks.

This question is related to r max

The answer is


Similar to colMeans, colSums, etc, you could write a column maximum function, colMax, and a column sort function, colSort.

colMax <- function(data) sapply(data, max, na.rm = TRUE)
colSort <- function(data, ...) sapply(data, sort, ...)

I use ... in the second function in hopes of sparking your intrigue.

Get your data:

dat <- read.table(h=T, text = "Ozone Solar.R Wind Temp Month Day
1     41     190  7.4   67     5   1
2     36     118  8.0   72     5   2
3     12     149 12.6   74     5   3
4     18     313 11.5   62     5   4
5     NA      NA 14.3   56     5   5
6     28      NA 14.9   66     5   6
7     23     299  8.6   65     5   7
8     19      99 13.8   59     5   8
9      8      19 20.1   61     5   9")

Use colMax function on sample data:

colMax(dat)
#  Ozone Solar.R    Wind    Temp   Month     Day 
#   41.0   313.0    20.1    74.0     5.0     9.0

To do the sorting on a single column,

sort(dat$Solar.R, decreasing = TRUE)
# [1] 313 299 190 149 118  99  19

and over all columns use our colSort function,

colSort(dat, decreasing = TRUE) ## compare with '...' above

Another way would be to use ?pmax

do.call('pmax', c(as.data.frame(t(ozone)),na.rm=TRUE))
#[1]  41.0 313.0  20.1  74.0   5.0   9.0

Assuming that your data in data.frame called maxinozone, you can do this

max(maxinozone[1, ], na.rm = TRUE)

Here's a dplyr solution:

library(dplyr)

# find max for each column
summarise_each(ozone, funs(max(., na.rm=TRUE)))

# sort by Solar.R, descending
arrange(ozone, desc(Solar.R))

UPDATE: summarise_each() has been deprecated in favour of a more featureful family of functions: mutate_all(), mutate_at(), mutate_if(), summarise_all(), summarise_at(), summarise_if()

Here is how you could do:

# find max for each column
ozone %>%
         summarise_if(is.numeric, funs(max(., na.rm=TRUE)))%>%
         arrange(Ozone)

or

ozone %>%
         summarise_at(vars(1:6), funs(max(., na.rm=TRUE)))%>%
         arrange(Ozone)

max(ozone$Ozone, na.rm = TRUE) should do the trick. Remember to include the na.rm = TRUE or else R will return NA.


In response to finding the max value for each column, you could try using the apply() function:

> apply(ozone, MARGIN = 2, function(x) max(x, na.rm=TRUE))
  Ozone Solar.R    Wind    Temp   Month     Day 
   41.0   313.0    20.1    74.0     5.0     9.0 

max(may$Ozone, na.rm = TRUE)

Without $Ozone it will filter in the whole data frame, this can be learned in the swirl library.

I'm studying this course on Coursera too ~


Try this solution:

Oz<-subset(data, data$Month==5,select=Ozone) # select ozone  value in the month of                 
                                             #May (i.e. Month = 5)
summary(T)                                   #gives caracteristics of table( contains 1 column of Ozone) including max, min ...

To get the max of any column you want something like:

max(ozone$Ozone, na.rm = TRUE)

To get the max of all columns, you want:

apply(ozone, 2, function(x) max(x, na.rm = TRUE))

And to sort:

ozone[order(ozone$Solar.R),]

Or to sort the other direction:

ozone[rev(order(ozone$Solar.R)),]

There is a package matrixStats that provides some functions to do column and row summaries, see in the package vignette, but you have to convert your data.frame into a matrix.

Then you run: colMaxs(as.matrix(ozone))