[r] Seeing if data is normally distributed in R

In addition to qqplots and the Shapiro-Wilk test, the following methods may be useful.

Qualitative:

  • histogram compared to the normal
  • cdf compared to the normal
  • ggdensity plot
  • ggqqplot

Quantitative:

The qualitive methods can be produced using the following in R:

library("ggpubr")
library("car")

h <- hist(data, breaks = 10, density = 10, col = "darkgray") 
xfit <- seq(min(data), max(data), length = 40) 
yfit <- dnorm(xfit, mean = mean(data), sd = sd(data)) 
yfit <- yfit * diff(h$mids[1:2]) * length(data) 
lines(xfit, yfit, col = "black", lwd = 2)

plot(ecdf(data), main="CDF")
lines(ecdf(rnorm(10000)),col="red")

ggdensity(data)

ggqqplot(data)

A word of caution - don't blindly apply tests. Having a solid understanding of stats will help you understand when to use which tests and the importance of assumptions in hypothesis testing.