[r] Function to calculate R2 (R-squared) in R

I have a dataframe with observed and modelled data, and I would like to calculate the R2 value. I expected there to be a function I could call for this, but can't locate one. I know I can write my own and apply it, but am I missing something obvious? I want something like

obs <- 1:5
mod <- c(0.8,2.4,2,3,4.8)
df <- data.frame(obs, mod)

R2 <- rsq(df)
# 0.85

This question is related to r function statistics

The answer is


You can also use the summary for linear models:

summary(lm(obs ~ mod, data=df))$r.squared 

Not sure why this isn't implemented directly in R, but this answer is essentially the same as Andrii's and Wordsforthewise, I just turned into a function for the sake of convenience if somebody uses it a lot like me.

r2_general <-function(preds,actual){ 
  return(1- sum((preds - actual) ^ 2)/sum((actual - mean(actual))^2))
}

Here is the simplest solution based on [https://en.wikipedia.org/wiki/Coefficient_of_determination]

# 1. 'Actual' and 'Predicted' data
df <- data.frame(
  y_actual = c(1:5),
  y_predicted  = c(0.8, 2.4, 2, 3, 4.8))

# 2. R2 Score components

# 2.1. Average of actual data
avr_y_actual <- mean(df$y_actual)

# 2.2. Total sum of squares
ss_total <- sum((df$y_actual - avr_y_actual)^2)

# 2.3. Regression sum of squares
ss_regression <- sum((df$y_predicted - avr_y_actual)^2)

# 2.4. Residual sum of squares
ss_residuals <- sum((df$y_actual - df$y_predicted)^2)

# 3. R2 Score
r2 <- 1 - ss_residuals / ss_total

Why not this:

rsq <- function(x, y) summary(lm(y~x))$r.squared
rsq(obs, mod)
#[1] 0.8560185

It is not something obvious, but the caret package has a function postResample() that will calculate "A vector of performance estimates" according to the documentation. The "performance estimates" are

  • RMSE
  • Rsquared
  • mean absolute error (MAE)

and have to be accessed from the vector like this

library(caret)
vect1 <- c(1, 2, 3)
vect2 <- c(3, 2, 2)
res <- caret::postResample(vect1, vect2)
rsq <- res[2]

However, this is using the correlation squared approximation for r-squared as mentioned in another answer. I'm not sure why Max Kuhn didn't just use the conventional 1-SSE/SST.

caret also has an R2() method, although it's hard to find in the documentation.

The way to implement the normal coefficient of determination equation is:

preds <- c(1, 2, 3)
actual <- c(2, 2, 4)
rss <- sum((preds - actual) ^ 2)
tss <- sum((actual - mean(actual)) ^ 2)
rsq <- 1 - rss/tss

Not too bad to code by hand of course, but why isn't there a function for it in a language primarily made for statistics? I'm thinking I must be missing the implementation of R^2 somewhere, or no one cares enough about it to implement it. Most of the implementations, like this one, seem to be for generalized linear models.


Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to function

$http.get(...).success is not a function Function to calculate R2 (R-squared) in R How to Call a Function inside a Render in React/Jsx How does Python return multiple values from a function? Default optional parameter in Swift function How to have multiple conditions for one if statement in python Uncaught TypeError: .indexOf is not a function Proper use of const for defining functions in JavaScript Run php function on button click includes() not working in all browsers

Examples related to statistics

Function to calculate R2 (R-squared) in R pandas: find percentile stats of a given column What exactly does numpy.exp() do? Find p-value (significance) in scikit-learn LinearRegression How to plot ROC curve in Python Pandas - Compute z-score for all columns Calculating percentile of dataset column How to normalize an array in NumPy to a unit vector? How to find row number of a value in R code np.mean() vs np.average() in Python NumPy?