[r] Seeing if data is normally distributed in R

Can someone please help me fill in the following function in R:

#data is a single vector of decimal values
normally.distributed <- function(data) {
if(data is normal)
return(TRUE)
else
return(NO)
}

This question is related to r normal-distribution

The answer is


SnowsPenultimateNormalityTest certainly has its virtues, but you may also want to look at qqnorm.

X <- rlnorm(100)
qqnorm(X)
qqnorm(rnorm(100))

I would also highly recommend the SnowsPenultimateNormalityTest in the TeachingDemos package. The documentation of the function is far more useful to you than the test itself, though. Read it thoroughly before using the test.


Consider using the function shapiro.test, which performs the Shapiro-Wilks test for normality. I've been happy with it.


In addition to qqplots and the Shapiro-Wilk test, the following methods may be useful.

Qualitative:

  • histogram compared to the normal
  • cdf compared to the normal
  • ggdensity plot
  • ggqqplot

Quantitative:

The qualitive methods can be produced using the following in R:

library("ggpubr")
library("car")

h <- hist(data, breaks = 10, density = 10, col = "darkgray") 
xfit <- seq(min(data), max(data), length = 40) 
yfit <- dnorm(xfit, mean = mean(data), sd = sd(data)) 
yfit <- yfit * diff(h$mids[1:2]) * length(data) 
lines(xfit, yfit, col = "black", lwd = 2)

plot(ecdf(data), main="CDF")
lines(ecdf(rnorm(10000)),col="red")

ggdensity(data)

ggqqplot(data)

A word of caution - don't blindly apply tests. Having a solid understanding of stats will help you understand when to use which tests and the importance of assumptions in hypothesis testing.


when you perform a test, you ever have the probabilty to reject the null hypothesis when it is true.

See the nextt R code:

p=function(n){
  x=rnorm(n,0,1)
  s=shapiro.test(x)
  s$p.value
}

rep1=replicate(1000,p(5))
rep2=replicate(1000,p(100))
plot(density(rep1))
lines(density(rep2),col="blue")
abline(v=0.05,lty=3)

The graph shows that whether you have a sample size small or big a 5% of the times you have a chance to reject the null hypothesis when it s true (a Type-I error)


The Anderson-Darling test is also be useful.

library(nortest)
ad.test(data)

library(DnE)
x<-rnorm(1000,0,1)
is.norm(x,10,0.05)