Here a shortened version of shasan's code, calculating the 95% confidence interval of the mean of array a
:
import numpy as np, scipy.stats as st
st.t.interval(0.95, len(a)-1, loc=np.mean(a), scale=st.sem(a))
But using StatsModels' tconfint_mean
is arguably even nicer:
import statsmodels.stats.api as sms
sms.DescrStatsW(a).tconfint_mean()
The underlying assumptions for both are that the sample (array a
) was drawn independently from a normal distribution with unknown standard deviation (see MathWorld or Wikipedia).
For large sample size n, the sample mean is normally distributed, and one can calculate its confidence interval using st.norm.interval()
(as suggested in Jaime's comment). But the above solutions are correct also for small n, where st.norm.interval()
gives confidence intervals that are too narrow (i.e., "fake confidence"). See my answer to a similar question for more details (and one of Russ's comments here).
Here an example where the correct options give (essentially) identical confidence intervals:
In [9]: a = range(10,14)
In [10]: mean_confidence_interval(a)
Out[10]: (11.5, 9.4457397432391215, 13.554260256760879)
In [11]: st.t.interval(0.95, len(a)-1, loc=np.mean(a), scale=st.sem(a))
Out[11]: (9.4457397432391215, 13.554260256760879)
In [12]: sms.DescrStatsW(a).tconfint_mean()
Out[12]: (9.4457397432391197, 13.55426025676088)
And finally, the incorrect result using st.norm.interval()
:
In [13]: st.norm.interval(0.95, loc=np.mean(a), scale=st.sem(a))
Out[13]: (10.23484868811834, 12.76515131188166)