unique combinations of values in selected columns in pandas data frame and count

Question

I have my data in pandas data frame as follows   df1   pd DataFrame   A    yes   yes   yes   yes   no   no   yes   yes   yes   no                        B    yes   no   no   no   yes   yes   no   yes   yes   no       So  my data looks like this  ---------------------------- index         A        B 0           yes      yes 1           yes       no 2           yes       no 3           yes       no 4            no      yes 5            no      yes 6           yes       no 7           yes      yes 8           yes      yes 9            no       no -----------------------------   I would like to transform it to another data frame   The expected output can be shown in the following python script   output   pd DataFrame   A    no   no   yes   yes    B    no   yes   no   yes    count   1 2 4 3      So  my expected output looks like this  -------------------------------------------- index      A       B       count -------------------------------------------- 0         no       no        1 1         no      yes        2 2        yes       no        4 3        yes      yes        3 --------------------------------------------   Actually  I can achieve to find all combinations and count them by using the following command  mytable   df1 groupby   A   B    size    However  it turns out that such combinations are in a single column   I would like to separate each value in a combination into different column and also add one more column for the result of counting   Is it possible to do that   May I have your suggestions   Thank you in advance

User · Answer

I haven t done time test with this but it was fun to try   Basically convert two columns to one column of tuples   Now convert that to a dataframe  do  value counts    which finds the unique elements and counts them   Fiddle with zip again and put the columns in order you want   You can probably make the steps more elegant but working with tuples seems more natural to me for this problem b   pd DataFrame   A    yes   yes   yes   yes   no   no   yes   yes   yes   no    B    yes   no   no   no   yes   yes   no   yes   yes   no      b  count     pd Series zip   b A b B    df   pd DataFrame b  count   value counts   reset index    df  A    df  B     zip  df  index    df   df drop columns  index     A   B   count

User · Answer

Placing  EdChum s very nice answer into a function count unique index   The unique method only works on pandas series  not on data frames   The function below reproduces the behavior of the unique function in R      unique returns a vector  data frame or array like x but with duplicate elements rows removed    And adds a count of the occurrences as requested by the OP   df1   pd DataFrame   A    yes   yes   yes   yes   no   no   yes   yes   yes   no                                                                                                                      B    yes   no   no   no   yes   yes   no   yes   yes   no                                                                                                    def count unique index df  by                                                                                                                                                        return df groupby by  size   reset index   rename columns  0  count                                                                                                           count unique index df1    A   B                                                                                                                                                       A    B  count                                                                                                                                                                   0   no   no      1                                                                                                                                                                   1   no  yes      2                                                                                                                                                                   2  yes   no      4                                                                                                                                                                   3  yes  yes      3

User · Answer

You can groupby on cols  A  and  B  and call size and then reset index and rename the generated column   In  26    df1 groupby   A   B    size   reset index   rename columns  0  count    Out 26        A    B  count 0   no   no      1 1   no  yes      2 2  yes   no      4 3  yes  yes      3   update  A little explanation  by grouping on the 2 columns  this groups rows where A and B values are the same  we call size which returns the number of unique groups   In 202   df1 groupby   A   B    size    Out 202    A    B   no   no     1      yes    2 yes  no     4      yes    3 dtype  int64   So now to restore the grouped columns  we call reset index   In 203   df1 groupby   A   B    size   reset index    Out 203         A    B  0 0   no   no  1 1   no  yes  2 2  yes   no  4 3  yes  yes  3   This restores the indices but the size aggregation is turned into a generated column 0  so we have to rename this   In 204   df1 groupby   A   B    size   reset index   rename columns  0  count     Out 204         A    B  count 0   no   no      1 1   no  yes      2 2  yes   no      4 3  yes  yes      3   groupby does accept the arg as index which we could have set to False so it doesn t make the grouped columns the index  but this generates a series and you d still have to restore the indices and so on       In 205   df1 groupby   A   B    as index False  size    Out 205    A    B   no   no     1      yes    2 yes  no     4      yes    3 dtype  int64

User · Answer

Slightly related  I was looking for the unique combinations and I came up with this method   def unique columns df columns        result   pd Series index   df index       groups   meta data csv groupby by   columns      for name group in groups         is unique   len group     1        result loc group index    is unique      assert not result isnull   any        return result   And if you only want to assert that all combinations are unique   df1 set index   A   B    index is unique

[python] unique combinations of values in selected columns in pandas data frame and count

Examples related to python

Examples related to pandas