I have a pandas data frame my_df, where I can find the mean(), median(), mode() of a given column:
my_df['field_A'].mean()
my_df['field_A'].median()
my_df['field_A'].mode()
I am wondering is it possible to find more detailed stats such as 90 percentile? Thanks!
This question is related to
python
python-2.7
pandas
statistics
You can even give multiple columns with null values and get multiple quantile values (I use 95 percentile for outlier treatment)
my_df[['field_A','field_B']].dropna().quantile([0.0, .5, .90, .95])
I figured out below would work:
my_df.dropna().quantile([0.0, .9])
assume series s
s = pd.Series(np.arange(100))
Get quantiles for [.1, .2, .3, .4, .5, .6, .7, .8, .9]
s.quantile(np.linspace(.1, 1, 9, 0))
0.1 9.9
0.2 19.8
0.3 29.7
0.4 39.6
0.5 49.5
0.6 59.4
0.7 69.3
0.8 79.2
0.9 89.1
dtype: float64
OR
s.quantile(np.linspace(.1, 1, 9, 0), 'lower')
0.1 9
0.2 19
0.3 29
0.4 39
0.5 49
0.6 59
0.7 69
0.8 79
0.9 89
dtype: int32
Source: Stackoverflow.com