Pandas sort by group aggregate and column

Question

Given the following dataframe  In  31   rand   np random RandomState 1           df   pd DataFrame   A     foo    bar    baz     2                               B   rand randn 6                                C   rand rand 6   gt   5    In  32   df Out 32        A         B      C          0  foo  1 624345  False          1  bar -0 611756   True          2  baz -0 528172  False          3  foo -1 072969   True          4  bar  0 865408  False          5  baz -2 301539   True    I would like to sort it in groups  A  by the aggregated sum of B  and then by the value in C  not aggregated   So basically get the order of the A groups with  In  28   df groupby  A   sum   sort  B   Out 28               B  C          A                         baz -2 829710  1          bar  0 253651  1          foo  0 551377  1   And then by True False  so that it ultimately looks like this   In  30   df ix  5  2  1  4  3  0   Out 30   A         B      C     5  baz -2 301539   True     2  baz -0 528172  False     1  bar -0 611756   True     4  bar  0 865408  False     3  foo -1 072969   True     0  foo  1 624345  False   How can this be done

User · Answer

One way to do this is to insert a dummy column with the sums in order to sort:

In [10]: sum_B_over_A = df.groupby('A').sum().B

In [11]: sum_B_over_A
Out[11]: 
A
bar    0.253652
baz   -2.829711
foo    0.551376
Name: B

in [12]: df['sum_B_over_A'] = df.A.apply(sum_B_over_A.get_value)

In [13]: df
Out[13]: 
     A         B      C  sum_B_over_A
0  foo  1.624345  False      0.551376
1  bar -0.611756   True      0.253652
2  baz -0.528172  False     -2.829711
3  foo -1.072969   True      0.551376
4  bar  0.865408  False      0.253652
5  baz -2.301539   True     -2.829711

In [14]: df.sort(['sum_B_over_A', 'A', 'B'])
Out[14]: 
     A         B      C   sum_B_over_A
5  baz -2.301539   True      -2.829711
2  baz -0.528172  False      -2.829711
1  bar -0.611756   True       0.253652
4  bar  0.865408  False       0.253652
3  foo -1.072969   True       0.551376
0  foo  1.624345  False       0.551376

and maybe you would drop the dummy row:

In [15]: df.sort(['sum_B_over_A', 'A', 'B']).drop('sum_B_over_A', axis=1)
Out[15]: 
     A         B      C
5  baz -2.301539   True
2  baz -0.528172  False
1  bar -0.611756   True
4  bar  0.865408  False
3  foo -1.072969   True
0  foo  1.624345  False

User · Answer

Here s a more concise approach     df  a bsum     df groupby  A    B   transform sum  df sort   a bsum   C    ascending  True  False   drop  a bsum   axis 1    The first line adds a column to the data frame with the groupwise sum  The second line performs the sort and then removes the extra column   Result       A       B           C 5   baz     -2 301539   True 2   baz     -0 528172   False 1   bar     -0 611756   True 4   bar      0 865408   False 3   foo     -1 072969   True 0   foo      1 624345   False   NOTE  sort is deprecated  use sort values instead

User · Answer

Groupby A   In  0   grp   df groupby  A     Within each group  sum over B and broadcast the values using transform   Then sort by B   In  1   grp   B    transform sum  sort  B   Out 1             B 2 -2 829710 5 -2 829710 1  0 253651 4  0 253651 0  0 551377 3  0 551377   Index the original df by passing the index from above   This will re-order the A values by the aggregate sum of the B values   In  2   sort1   df ix grp   B    transform sum  sort  B   index   In  3   sort1 Out 3        A         B      C 2  baz -0 528172  False 5  baz -2 301539   True 1  bar -0 611756   True 4  bar  0 865408  False 0  foo  1 624345  False 3  foo -1 072969   True   Finally  sort the  C  values within groups of  A  using the sort False option to preserve the A sort order from step 1   In  4   f   lambda x  x sort  C   ascending False   In  5   sort2   sort1 groupby  A   sort False  apply f   In  6   sort2 Out 6            A         B      C A baz 5  baz -2 301539   True     2  baz -0 528172  False bar 1  bar -0 611756   True     4  bar  0 865408  False foo 3  foo -1 072969   True     0  foo  1 624345  False   Clean up the df index by using reset index with drop True   In  7   sort2 reset index 0  drop True  Out 7        A         B      C 5  baz -2 301539   True 2  baz -0 528172  False 1  bar -0 611756   True 4  bar  0 865408  False 3  foo -1 072969   True 0  foo  1 624345  False

[python] Pandas sort by group aggregate and column

Examples related to python

Examples related to sorting

Examples related to group-by

Examples related to dataframe

Examples related to pandas