T-test in Pandas

Question

If I want to calculate the mean of two categories in Pandas  I can do it like this   data     Category     cat2   cat1   cat2   cat1   cat2   cat1   cat2   cat1   cat1   cat1   cat2             values    1 2 3 1 2 3 1 2 3 5 1   my data   DataFrame data  my data groupby  Category   mean    Category      values     cat1     2 666667 cat2     1 600000   I have a lot of data formatted this way  and now I need to do a T-test to see if the mean of cat1 and cat2 are statistically different  How can I do that

User · Accepted Answer

it depends what sort of t-test you want to do (one sided or two sided dependent or independent) but it should be as simple as:

from scipy.stats import ttest_ind

cat1 = my_data[my_data['Category']=='cat1']
cat2 = my_data[my_data['Category']=='cat2']

ttest_ind(cat1['values'], cat2['values'])
>>> (1.4927289925706944, 0.16970867501294376)

it returns a tuple with the t-statistic & the p-value

see here for other t-tests http://docs.scipy.org/doc/scipy/reference/stats.html

User · Answer

I simplify the code a little bit   from scipy stats import ttest ind ttest ind  my data groupby  Category    value   apply lambda x list x

User · Answer

EDIT  I had not realized this was about the data format  You could use  import pandas as pd import scipy two data   pd DataFrame data  index data  Category      Then accessing the categories is as simple as  scipy stats ttest ind two data loc  cat    two data loc  cat2    equal var False    The loc operator accesses rows by label     As  G Garcia said     one sided or two sided dependent or independent   If you have two independent samples but you do not know that they have equal variance  you can use Welch s t-test  It is as simple as  scipy stats ttest ind cat1  values    cat2  values    equal var False    For reasons to prefer Welch s test  see https   stats stackexchange com questions 305 when-conducting-a-t-test-why-would-one-prefer-to-assume-or-test-for-equal-vari   For two dependent samples  you can use  scipy stats ttest rel cat1  values    cat2  values

[python] T-test in Pandas

Examples related to python

Examples related to pandas

Examples related to scipy

Examples related to statistics

Examples related to hypothesis-test