[python] Renaming Column Names in Pandas Groupby function

Q1) I want to do a groupby, SQL-style aggregation and rename the output column:

Example dataset:

>>> df
    ID     Region  count
0  100       Asia      2
1  101     Europe      3
2  102         US      1
3  103     Africa      5
4  100     Russia      5
5  101  Australia      7
6  102         US      8
7  104       Asia     10
8  105     Europe     11
9  110     Africa     23

I want to group the observations of this dataset by ID and Region and summing the count for each group. So I used something like this...

>>> print(df.groupby(['ID','Region'],as_index=False).count().sum())

    ID     Region  count
0  100       Asia      2
1  100     Russia      5
2  101  Australia      7
3  101     Europe      3
4  102         US      9
5  103     Africa      5
6  104       Asia     10
7  105     Europe     11
8  110     Africa     23

On using as_index=False I am able to get "SQL-Like" output. My problem is that I am unable to rename the aggregate variable count here. So in SQL if wanted to do the above thing I would do something like this:

select ID, Region, sum(count) as Total_Numbers
from df
group by ID, Region
order by ID, Region

As we see, it's very easy for me to rename the aggregate variable count to Total_Numbers in SQL. I wanted to do the same thing in Pandas but unable to find such an option in group-by function. Can somebody help?

The second question (more of an observation) is whether...

Q2) Is it possible to directly use column names in Pandas dataframe functions without enclosing them in quotes?

I understand that the variable names are strings, so have to be inside quotes, but I see if use them outside dataframe function and as an attribute we don't require them to be inside quotes. Like df.ID.sum() etc. It's only when we use it in a DataFrame function like df.sort() or df.groupby we have to use it inside quotes. This is actually a bit of pain as in SQL or in SAS or other languages we simply use the variable name without quoting them. Any suggestion on this?

Kindly reply to both questions (Q1 is the main, Q2 more of an opinion).

This question is related to python pandas group-by pandas-groupby rename

The answer is


The current (as of version 0.20) method for changing column names after a groupby operation is to chain the rename method. See this deprecation note in the documentation for more detail.

Deprecated Answer as of pandas version 0.20

This is the first result in google and although the top answer works it does not really answer the question. There is a better answer here and a long discussion on github about the full functionality of passing dictionaries to the agg method.

These answers unfortunately do not exist in the documentation but the general format for grouping, aggregating and then renaming columns uses a dictionary of dictionaries. The keys to the outer dictionary are column names that are to be aggregated. The inner dictionaries have keys that the new column names with values as the aggregating function.

Before we get there, let's create a four column DataFrame.

df = pd.DataFrame({'A' : list('wwwwxxxx'), 
                   'B':list('yyzzyyzz'), 
                   'C':np.random.rand(8), 
                   'D':np.random.rand(8)})

   A  B         C         D
0  w  y  0.643784  0.828486
1  w  y  0.308682  0.994078
2  w  z  0.518000  0.725663
3  w  z  0.486656  0.259547
4  x  y  0.089913  0.238452
5  x  y  0.688177  0.753107
6  x  z  0.955035  0.462677
7  x  z  0.892066  0.368850

Let's say we want to group by columns A, B and aggregate column C with mean and median and aggregate column D with max. The following code would do this.

df.groupby(['A', 'B']).agg({'C':['mean', 'median'], 'D':'max'})

            D         C          
          max      mean    median
A B                              
w y  0.994078  0.476233  0.476233
  z  0.725663  0.502328  0.502328
x y  0.753107  0.389045  0.389045
  z  0.462677  0.923551  0.923551

This returns a DataFrame with a hierarchical index. The original question asked about renaming the columns in the same step. This is possible using a dictionary of dictionaries:

df.groupby(['A', 'B']).agg({'C':{'C_mean': 'mean', 'C_median': 'median'}, 
                            'D':{'D_max': 'max'}})

            D         C          
        D_max    C_mean  C_median
A B                              
w y  0.994078  0.476233  0.476233
  z  0.725663  0.502328  0.502328
x y  0.753107  0.389045  0.389045
  z  0.462677  0.923551  0.923551

This renames the columns all in one go but still leaves the hierarchical index which the top level can be dropped with df.columns = df.columns.droplevel(0).


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to group-by

SELECT list is not in GROUP BY clause and contains nonaggregated column .... incompatible with sql_mode=only_full_group_by Count unique values using pandas groupby Pandas group-by and sum Count unique values with pandas per groups Group dataframe and get sum AND count? Error related to only_full_group_by when executing a query in MySql Pandas sum by groupby, but exclude certain columns Using DISTINCT along with GROUP BY in SQL Server Python Pandas : group by in group by and average? How do I create a new column from the output of pandas groupby().sum()?

Examples related to pandas-groupby

Count unique values with pandas per groups Group dataframe and get sum AND count? How do I create a new column from the output of pandas groupby().sum()? How to loop over grouped Pandas dataframe? Concatenate strings from several rows using Pandas groupby pandas dataframe groupby datetime month How to group dataframe rows into list in pandas groupby Renaming Column Names in Pandas Groupby function Get statistics for each group (such as count, mean, etc) using pandas GroupBy? pandas GroupBy columns with NaN (missing) values

Examples related to rename

Gradle - Move a folder from ABC to XYZ How to rename a component in Angular CLI? How do I completely rename an Xcode project (i.e. inclusive of folders)? How to rename a directory/folder on GitHub website? How do I rename both a Git local and remote branch name? Convert row to column header for Pandas DataFrame, Renaming files using node.js Replacement for "rename" in dplyr Rename multiple columns by names Rename specific column(s) in pandas