Renaming Column Names in Pandas Groupby function

Question

Q1  I want to do a groupby  SQL-style aggregation and rename the output column   Example dataset    gt  gt  gt  df     ID     Region  count 0  100       Asia      2 1  101     Europe      3 2  102         US      1 3  103     Africa      5 4  100     Russia      5 5  101  Australia      7 6  102         US      8 7  104       Asia     10 8  105     Europe     11 9  110     Africa     23   I want to group the observations of this dataset by ID and Region and summing the count for each group  So I used something like this      gt  gt  gt  print df groupby   ID   Region   as index False  count   sum         ID     Region  count 0  100       Asia      2 1  100     Russia      5 2  101  Australia      7 3  101     Europe      3 4  102         US      9 5  103     Africa      5 6  104       Asia     10 7  105     Europe     11 8  110     Africa     23   On using as index False I am able to get  SQL-Like  output  My problem is that I am unable to rename the aggregate variable count here  So in SQL if wanted to do the above thing I would do something like this   select ID  Region  sum count  as Total Numbers from df group by ID  Region order by ID  Region   As we see  it s very easy for me to rename the aggregate variable count to Total Numbers in SQL  I wanted to do the same thing in Pandas but unable to find such an option in group-by function  Can somebody help   The second question  more of an observation  is whether     Q2  Is it possible to directly use column names in Pandas dataframe functions without enclosing them in quotes   I understand that the variable names are strings  so have to be inside quotes  but I see if use them outside dataframe function and as an attribute we don t require them to be inside quotes  Like df ID sum   etc  It s only when we use it in a DataFrame function like df sort   or df groupby we have to use it inside quotes  This is actually a bit of pain as in SQL or in SAS or other languages we simply use the variable name without quoting them  Any suggestion on this    Kindly reply to both questions  Q1 is the main  Q2 more of an opinion

User · Answer

The current  as of version 0 20  method for changing column names after a groupby operation is to chain the rename method  See this deprecation note in the documentation for more detail   Deprecated Answer as of pandas version 0 20  This is the first result in google and although the top answer works it does not really answer the question  There is a better answer here and a long discussion on github about the full functionality of passing dictionaries to the agg method    These answers unfortunately do not exist in the documentation but the general format for grouping  aggregating and then renaming columns uses a dictionary of dictionaries  The keys to the outer dictionary are column names that are to be aggregated  The inner dictionaries have keys that the new column names with values as the aggregating function    Before we get there  let s create a four column DataFrame    df   pd DataFrame   A    list  wwwwxxxx                         B  list  yyzzyyzz                         C  np random rand 8                        D  np random rand 8        A  B         C         D 0  w  y  0 643784  0 828486 1  w  y  0 308682  0 994078 2  w  z  0 518000  0 725663 3  w  z  0 486656  0 259547 4  x  y  0 089913  0 238452 5  x  y  0 688177  0 753107 6  x  z  0 955035  0 462677 7  x  z  0 892066  0 368850   Let s say we want to group by columns A  B and aggregate column C with mean and median and aggregate column D with max  The following code would do this   df groupby   A    B    agg   C    mean    median     D   max                 D         C                     max      mean    median A B                               w y  0 994078  0 476233  0 476233   z  0 725663  0 502328  0 502328 x y  0 753107  0 389045  0 389045   z  0 462677  0 923551  0 923551   This returns a DataFrame with a hierarchical index  The original question asked about renaming the columns in the same step  This is possible using a dictionary of dictionaries   df groupby   A    B    agg   C    C mean    mean    C median    median                                  D    D max    max                  D         C                   D max    C mean  C median A B                               w y  0 994078  0 476233  0 476233   z  0 725663  0 502328  0 502328 x y  0 753107  0 389045  0 389045   z  0 462677  0 923551  0 923551   This renames the columns all in one go but still leaves the hierarchical index which the top level can be dropped with df columns   df columns droplevel 0

User · Answer

For the first question I think answer would be    lt your DataFrame gt  rename columns   count   Total Numbers      or   lt your DataFrame gt  columns     ID    Region    Total Numbers     As for second one I d say the answer would be no  It s possible to use it like  df ID  because of python datamodel      Attribute references are translated to lookups in this dictionary    e g   m x is equivalent to m dict  x

[python] Renaming Column Names in Pandas Groupby function

Deprecated Answer as of pandas version 0.20

Examples related to python

Examples related to pandas

Examples related to group-by

Examples related to pandas-groupby

Examples related to rename