[r] Fitting a density curve to a histogram in R

Is there a function in R that fits a curve to a histogram?

Let's say you had the following histogram

hist(c(rep(65, times=5), rep(25, times=5), rep(35, times=10), rep(45, times=4)))

It looks normal, but it's skewed. I want to fit a normal curve that is skewed to wrap around this histogram.

This question is rather basic, but I can't seem to find the answer for R on the internet.

This question is related to r histogram curve-fitting r-faq

The answer is


Here's the way I do it:

foo <- rnorm(100, mean=1, sd=2)
hist(foo, prob=TRUE)
curve(dnorm(x, mean=mean(foo), sd=sd(foo)), add=TRUE)

A bonus exercise is to do this with ggplot2 package ...


I had the same problem but Dirk's solution didn't seem to work. I was getting this warning messege every time

"prob" is not a graphical parameter

I read through ?hist and found about freq: a logical vector set TRUE by default.

the code that worked for me is

hist(x,freq=FALSE)
lines(density(x),na.rm=TRUE)

Such thing is easy with ggplot2

library(ggplot2)
dataset <- data.frame(X = c(rep(65, times=5), rep(25, times=5), 
                            rep(35, times=10), rep(45, times=4)))
ggplot(dataset, aes(x = X)) + 
  geom_histogram(aes(y = ..density..)) + 
  geom_density()

or to mimic the result from Dirk's solution

ggplot(dataset, aes(x = X)) + 
  geom_histogram(aes(y = ..density..), binwidth = 5) + 
  geom_density()

Dirk has explained how to plot the density function over the histogram. But sometimes you might want to go with the stronger assumption of a skewed normal distribution and plot that instead of density. You can estimate the parameters of the distribution and plot it using the sn package:

> sn.mle(y=c(rep(65, times=5), rep(25, times=5), rep(35, times=10), rep(45, times=4)))
$call
sn.mle(y = c(rep(65, times = 5), rep(25, times = 5), rep(35, 
    times = 10), rep(45, times = 4)))

$cp
    mean     s.d. skewness 
41.46228 12.47892  0.99527 

Skew-normal distributed data plot

This probably works better on data that is more skew-normal:

Another skew-normal plot


Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to histogram

Why isn't this code to plot a histogram on a continuous value Pandas column working? Make Frequency Histogram for Factor Variables Overlay normal curve to histogram in R Plotting histograms from grouped data in a pandas DataFrame save a pandas.Series histogram plot to file changing default x range in histogram matplotlib How does numpy.histogram() work? Fitting a histogram with python Bin size in Matplotlib (Histogram) Plot two histograms on single chart with matplotlib

Examples related to curve-fitting

python numpy/scipy curve fitting fitting data with numpy Fitting a histogram with python Linear regression with matplotlib / numpy Fitting polynomial model to data in R How to fit a smooth curve to my data in R? How to do exponential and logarithmic curve fitting in Python? I found only polynomial fitting Fitting a density curve to a histogram in R How do I calculate r-squared using Python and Numpy?

Examples related to r-faq

What does "The following object is masked from 'package:xxx'" mean? What does "Error: object '<myvariable>' not found" mean? How do I deal with special characters like \^$.?*|+()[{ in my regex? What does %>% function mean in R? How to plot a function curve in R Use dynamic variable names in `dplyr` Error: unexpected symbol/input/string constant/numeric constant/SPECIAL in my code How should I deal with "package 'xxx' is not available (for R version x.y.z)" warning? How to select the row with the maximum value in each group R data formats: RData, Rda, Rds etc