The current (as of version 0.20) method for changing column names after a groupby operation is to chain the rename
method. See this deprecation note in the documentation for more detail.
This is the first result in google and although the top answer works it does not really answer the question. There is a better answer here and a long discussion on github about the full functionality of passing dictionaries to the agg
method.
These answers unfortunately do not exist in the documentation but the general format for grouping, aggregating and then renaming columns uses a dictionary of dictionaries. The keys to the outer dictionary are column names that are to be aggregated. The inner dictionaries have keys that the new column names with values as the aggregating function.
Before we get there, let's create a four column DataFrame.
df = pd.DataFrame({'A' : list('wwwwxxxx'),
'B':list('yyzzyyzz'),
'C':np.random.rand(8),
'D':np.random.rand(8)})
A B C D
0 w y 0.643784 0.828486
1 w y 0.308682 0.994078
2 w z 0.518000 0.725663
3 w z 0.486656 0.259547
4 x y 0.089913 0.238452
5 x y 0.688177 0.753107
6 x z 0.955035 0.462677
7 x z 0.892066 0.368850
Let's say we want to group by columns A, B
and aggregate column C
with mean
and median
and aggregate column D
with max
. The following code would do this.
df.groupby(['A', 'B']).agg({'C':['mean', 'median'], 'D':'max'})
D C
max mean median
A B
w y 0.994078 0.476233 0.476233
z 0.725663 0.502328 0.502328
x y 0.753107 0.389045 0.389045
z 0.462677 0.923551 0.923551
This returns a DataFrame with a hierarchical index. The original question asked about renaming the columns in the same step. This is possible using a dictionary of dictionaries:
df.groupby(['A', 'B']).agg({'C':{'C_mean': 'mean', 'C_median': 'median'},
'D':{'D_max': 'max'}})
D C
D_max C_mean C_median
A B
w y 0.994078 0.476233 0.476233
z 0.725663 0.502328 0.502328
x y 0.753107 0.389045 0.389045
z 0.462677 0.923551 0.923551
This renames the columns all in one go but still leaves the hierarchical index which the top level can be dropped with df.columns = df.columns.droplevel(0)
.