[python] Compute a confidence interval from sample data

Here a shortened version of shasan's code, calculating the 95% confidence interval of the mean of array a:

import numpy as np, scipy.stats as st

st.t.interval(0.95, len(a)-1, loc=np.mean(a), scale=st.sem(a))

But using StatsModels' tconfint_mean is arguably even nicer:

import statsmodels.stats.api as sms

sms.DescrStatsW(a).tconfint_mean()

The underlying assumptions for both are that the sample (array a) was drawn independently from a normal distribution with unknown standard deviation (see MathWorld or Wikipedia).

For large sample size n, the sample mean is normally distributed, and one can calculate its confidence interval using st.norm.interval() (as suggested in Jaime's comment). But the above solutions are correct also for small n, where st.norm.interval() gives confidence intervals that are too narrow (i.e., "fake confidence"). See my answer to a similar question for more details (and one of Russ's comments here).

Here an example where the correct options give (essentially) identical confidence intervals:

In [9]: a = range(10,14)

In [10]: mean_confidence_interval(a)
Out[10]: (11.5, 9.4457397432391215, 13.554260256760879)

In [11]: st.t.interval(0.95, len(a)-1, loc=np.mean(a), scale=st.sem(a))
Out[11]: (9.4457397432391215, 13.554260256760879)

In [12]: sms.DescrStatsW(a).tconfint_mean()
Out[12]: (9.4457397432391197, 13.55426025676088)

And finally, the incorrect result using st.norm.interval():

In [13]: st.norm.interval(0.95, loc=np.mean(a), scale=st.sem(a))
Out[13]: (10.23484868811834, 12.76515131188166)

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to numpy

Unable to allocate array with shape and data type How to fix 'Object arrays cannot be loaded when allow_pickle=False' for imdb.load_data() function? Numpy, multiply array with scalar TypeError: only integer scalar arrays can be converted to a scalar index with 1D numpy indices array Could not install packages due to a "Environment error :[error 13]: permission denied : 'usr/local/bin/f2py'" Pytorch tensor to numpy array Numpy Resize/Rescale Image what does numpy ndarray shape do? How to round a numpy array? numpy array TypeError: only integer scalar arrays can be converted to a scalar index

Examples related to scipy

Reading images in python Numpy Resize/Rescale Image How to get the indices list of all NaN value in numpy array? ImportError: cannot import name NUMPY_MKL numpy.where() detailed, step-by-step explanation / examples Scikit-learn train_test_split with indices Matplotlib: Specify format of floats for tick labels Installing NumPy and SciPy on 64-bit Windows (with Pip) Can't install Scipy through pip Plotting a fast Fourier transform in Python

Examples related to statistics

Function to calculate R2 (R-squared) in R pandas: find percentile stats of a given column What exactly does numpy.exp() do? Find p-value (significance) in scikit-learn LinearRegression How to plot ROC curve in Python Pandas - Compute z-score for all columns Calculating percentile of dataset column How to normalize an array in NumPy to a unit vector? How to find row number of a value in R code np.mean() vs np.average() in Python NumPy?

Examples related to confidence-interval

How to calculate the 95% confidence interval for the slope in a linear regression model in R Compute a confidence interval from sample data