Pandas groupby How to get a union of strings

Question

I have a dataframe like this      A         B       C 0  1  0 749065    This 1  2  0 301084      is 2  3  0 463468       a 3  4  0 643961  random 4  1  0 866521  string 5  2  0 120737           Calling   In  10   print df groupby  A    B   sum     will return   A 1    1 615586 2    0 421821 3    0 463468 4    0 643961   Now I would like to do  the same  for column  C   Because that column contains strings  sum   doesn t work  although you might think that it would concatenate the strings   What I would really like to see is a list or set of the strings for each group  i e    A 1     This  string  2     is     3     a  4     random    I have been trying to find ways to do this    Series unique    http   pandas pydata org pandas-docs stable generated pandas Series unique html  doesn t work  although  df groupby  A    B     is a  pandas core groupby SeriesGroupBy object   so I was hoping any Series method would work  Any ideas

User · Answer

Named aggregations with pandas  gt   0 25 0  Since pandas version 0 25 0 we have named aggregations where we can groupby  aggregate and at the same time assign new names to our columns  This way we won t get the MultiIndex columns  and the column names make more sense given the data they contain     aggregate and get a list of strings  grp   df groupby  A   agg B sum   B   sum                              C   C   list   reset index    print grp     A     B sum               C 0  1  1 615586   This  string  1  2  0 421821          is     2  3  0 463468              a  3  4  0 643961         random      aggregate and join the strings  grp   df groupby  A   agg B sum   B   sum                              C   C        join   reset index    print grp     A     B sum             C 0  1  1 615586  This  string 1  2  0 421821         is    2  3  0 463468             a 3  4  0 643961        random

User · Answer

If you d like to overwrite column B in the dataframe  this should work       df   df groupby  A  as index False  agg lambda x   n  join x

User · Answer

You can use the apply method to apply an arbitrary function to the grouped data   So if you want a set  apply set   If you want a list  apply list    gt  gt  gt  d    A       B 0  1    This 1  2      is 2  3       a 3  4  random 4  1  string 5  2          gt  gt  gt  d groupby  A    B   apply list  A 1     This  string  2            is     3                a  4           random  dtype  object   If you want something else  just write a function that does what you want and then apply that

User · Answer

You could try this   df groupby  A   agg   B   sum   C   -  join

User · Answer

Following  Erfan s good answer  most of the times in an analysis of aggregate values you want the unique possible combinations of these existing character values   unique chars   lambda x       join x unique     df   groupby   A      agg   C   unique chars

User · Answer

You may be able to use the aggregate  or agg  function to concatenate the values   Untested code   df groupby  A    B   agg lambda col     join col

User · Answer

In  4   df   read csv StringIO data  sep   s     In  5   df Out 5       A         B       C 0  1  0 749065    This 1  2  0 301084      is 2  3  0 463468       a 3  4  0 643961  random 4  1  0 866521  string 5  2  0 120737          In  6   df dtypes Out 6    A      int64 B    float64 C     object dtype  object   When you apply your own function  there is not automatic exclusions of non-numeric columns  This is slower  though  than the application of  sum   to the groupby  In  8   df groupby  A   apply lambda x  x sum    Out 8       A         B           C A                          1  2  1 615586  Thisstring 2  4  0 421821         is  3  3  0 463468           a 4  4  0 643961      random   sum by default concatenates  In  9   df groupby  A    C   apply lambda x  x sum    Out 9    A 1    Thisstring 2           is  3             a 4        random dtype  object   You can do pretty much what you want  In  11   df groupby  A    C   apply lambda x     s          join x   Out 11    A 1     This  string  2            is     3                a  4           random  dtype  object   Doing this on a whole frame  one group at a time  Key is to return a Series  def f x        return Series dict A   x  A   sum                             B   x  B   sum                             C      s          join x  C       In  14   df groupby  A   apply f  Out 14       A         B               C A                              1  2  1 615586   This  string  2  4  0 421821          is     3  3  0 463468              a  4  4  0 643961         random

User · Answer

a simple solution would be     gt  gt  gt  df groupby   A   B    c unique   reset index

[python] Pandas groupby: How to get a union of strings

Named aggregations with `pandas >= 0.25.0`

Examples related to python

Examples related to pandas

[python] Pandas groupby: How to get a union of strings

Named aggregations with pandas >= 0.25.0

Examples related to python

Examples related to pandas

Named aggregations with `pandas >= 0.25.0`