[r] Mean of a column in a data frame, given the column's name

I'm inside a big function I have to write. In the last part I have to calculate the mean of a column in a data frame. The name of the column I am operating on is given as an argument to the function.

This question is related to r mean

The answer is


if your column contain any value that you want to neglect. it will help you

## da is data frame & Ozone is column name 

##for single column
mean(da$Ozone, na.rm = TRUE)  

##for all columns
colMeans(x=da, na.rm = TRUE)

Any of the following should work!!

df <- data.frame(x=1:3,y=4:6)

mean(df$x)
mean(df[,1])
mean(df[["x"]])

Use summarise in the dplyr package:

library(dplyr)
summarise(df, Average = mean(col_name, na.rm = T))

note: dplyr supports both summarise and summarize.


I think you're asking how to compute the mean of a variable in a data frame, given the name of the column. There are two typical approaches to doing this, one indexing with [[ and the other indexing with [:

data(iris)
mean(iris[["Petal.Length"]])
# [1] 3.758
mean(iris[,"Petal.Length"])
# [1] 3.758
mean(iris[["Sepal.Width"]])
# [1] 3.057333
mean(iris[,"Sepal.Width"])
# [1] 3.057333

I think what you are being asked to do (or perhaps asking yourself?) is take a character value which matches the name of a column in a particular dataframe (possibly also given as a character). There are two tricks here. Most people learn to extract columns with the "$" operator and that won't work inside a function if the function is passed a character vecor. If the function is also supposed to accept character argument then you will need to use the get function as well:

 df1 <- data.frame(a=1:10, b=11:20)
 mean_col <- function( dfrm, col ) mean( get(dfrm)[[ col ]] )
 mean_col("df1", "b")
 # [1] 15.5

There is sort of a semantic boundary between ordinary objects like character vectors and language objects like the names of objects. The get function is one of the functions that lets you "promote" character values to language level evaluation. And the "$" function will NOT evaluate its argument in a function, so you need to use"[[". "$" only is useful at the console level and needs to be completely avoided in functions.


Suppose you have a data frame(say df) with columns "x" and "y", you can find mean of column (x or y) using:

1.Using mean() function

z<-mean(df$x)

2.Using the column name(say x) as a variable using attach() function

 attach(df)
 mean(x)

When done you can call detach() to remove "x"

detach()

3.Using with() function, it lets you use columns of data frame as distinct variables.

 z<-with(df,mean(x))