Converting a Pandas GroupBy output from Series to DataFrame

Question

I m starting with input data like this  df1   pandas DataFrame          Name      Alice    Bob    Mallory    Mallory    Bob     Mallory           City      Seattle    Seattle    Portland    Seattle    Seattle    Portland         Which when printed appears as this      City     Name 0   Seattle    Alice 1   Seattle      Bob 2  Portland  Mallory 3   Seattle  Mallory 4   Seattle      Bob 5  Portland  Mallory   Grouping is simple enough   g1   df1 groupby     Name    City     count     and printing yields a GroupBy object                     City  Name Name    City Alice   Seattle      1     1 Bob     Seattle      2     2 Mallory Portland     2     2         Seattle      1     1   But what I want eventually is another DataFrame object that contains all the rows in the GroupBy object  In other words I want to get the following result                     City  Name Name    City Alice   Seattle      1     1 Bob     Seattle      2     2 Mallory Portland     2     2 Mallory Seattle      1     1   I can t quite see how to accomplish this in the pandas documentation  Any hints would be welcome

User · Answer

Below solution may be simpler   df1 reset index   groupby     Name    City   as index False   count

User · Answer

These solutions only partially worked for me because I was doing multiple aggregations  Here is a sample output of my grouped by that I wanted to convert to a dataframe      Because I wanted more than the count provided by reset index    I wrote a manual method for converting the image above into a dataframe  I understand this is not the most pythonic pandas way of doing this as it is quite verbose and explicit  but it was all I needed  Basically  use the reset index   method explained above to start a  scaffolding  dataframe  then loop through the group pairings in the grouped dataframe  retrieve the indices  perform your calculations against the ungrouped dataframe  and set the value in your new aggregated dataframe   df grouped   df   Salary Basis    Job Title    Hourly Rate    Male Count    Female Count    df grouped   df grouped groupby   Salary Basis    Job Title    as index False     Grouped gives us the indices we want for each grouping   We cannot convert a groupedby object back to a dataframe  so we need to do it manually   Create a new dataframe to work against df aggregated   df grouped size   to frame  Total Count   reset index   df aggregated  Male Count     0 df aggregated  Female Count     0 df aggregated  Job Rate     0  def manualAggregations indices array       temp df   df iloc indices array      return            Male Count   temp df  Male Count   sum             Female Count   temp df  Female Count   sum             Job Rate   temp df  Hourly Rate   max          for name  group in df grouped      ix   df grouped indices name      calcDict   manualAggregations ix       for key in calcDict           Salary Basis  Job Title         columns   list name          df aggregated loc  df aggregated  Salary Basis      columns 0    amp                              df aggregated  Job Title      columns 1    key    calcDict key    If a dictionary isn t your thing  the calculations could be applied inline in the for loop       df aggregated  Male Count   loc  df aggregated  Salary Basis      columns 0    amp                                    df aggregated  Job Title      columns 1      df  Male Count   iloc ix  sum

User · Answer

The key is to use the reset index   method   Use   import pandas  df1   pandas DataFrame          Name      Alice    Bob    Mallory    Mallory    Bob     Mallory           City      Seattle    Seattle    Portland    Seattle    Seattle    Portland        g1   df1 groupby     Name    City     count   reset index     Now you have your new dataframe in g1

User · Answer

g1 here is a DataFrame  It has a hierarchical index  though   In  19   type g1  Out 19   pandas core frame DataFrame  In  20   g1 index Out 20    MultiIndex    Alice    Seattle      Bob    Seattle      Mallory    Portland             Mallory    Seattle     dtype object    Perhaps you want something like this   In  21   g1 add suffix   Count   reset index   Out 21          Name      City  City Count  Name Count 0    Alice   Seattle           1           1 1      Bob   Seattle           2           2 2  Mallory  Portland           2           2 3  Mallory   Seattle           1           1   Or something like   In  36   DataFrame   count    df1 groupby     Name    City     size     reset index   Out 36          Name      City  count 0    Alice   Seattle      1 1      Bob   Seattle      2 2  Mallory  Portland      2 3  Mallory   Seattle      1

User · Answer

I found this worked for me   import numpy as np import pandas as pd  df1   pd DataFrame         Name      Alice    Bob    Mallory    Mallory    Bob     Mallory           City      Seattle    Seattle    Portland    Seattle    Seattle    Portland      df1  City count     1 df1  Name count     1  df1 groupby   Name    City    as index False  count

User · Answer

Simply  this should do the task   import pandas as pd  grouped df   df1 groupby     Name    City      pd DataFrame grouped df size   reset index name    Group Count      Here  grouped df size   pulls up the unique groupby count  and reset index   method resets the name of the column you want it to be  Finally  the pandas Dataframe   function is called upon to create a DataFrame object

User · Answer

I want to slightly change the answer given by Wes  because version 0 16 2 requires as index False  If you don t set it  you get an empty dataframe  Source   Aggregation functions will not return the groups that you are aggregating over if they are named columns  when as index True  the default  The grouped columns will be the indices of the returned object  Passing as index False will return the groups that you are aggregating over  if they are named columns  Aggregating functions are ones that reduce the dimension of the returned objects  for example  mean  sum  size  count  std  var  sem  describe  first  last  nth  min  max  This is what happens when you do for example DataFrame sum   and get back a Series  nth can act as a reducer or a filter  see here   import pandas as pd  df1   pd DataFrame   quot Name quot    quot Alice quot    quot Bob quot    quot Mallory quot    quot Mallory quot    quot Bob quot     quot Mallory quot                         quot City quot    quot Seattle quot   quot Seattle quot   quot Portland quot   quot Seattle quot   quot Seattle quot   quot Portland quot     print df1           City     Name  0   Seattle    Alice  1   Seattle      Bob  2  Portland  Mallory  3   Seattle  Mallory  4   Seattle      Bob  5  Portland  Mallory   g1   df1 groupby   quot Name quot    quot City quot    as index False  count   print g1                      City  Name  Name    City  Alice   Seattle      1     1  Bob     Seattle      2     2  Mallory Portland     2     2          Seattle      1     1    EDIT  In version 0 17 1 and later you can use subset in count and reset index with parameter name in size  print df1 groupby   quot Name quot    quot City quot    as index False   count    IndexError  list index out of range  print df1 groupby   quot Name quot    quot City quot    count    Empty DataFrame  Columns      Index    Alice  Seattle    Bob  Seattle    Mallory  Portland    Mallory  Seattle    print df1 groupby   quot Name quot    quot City quot      Name   City    count                      Name  City  Name    City                  Alice   Seattle      1     1  Bob     Seattle      2     2  Mallory Portland     2     2          Seattle      1     1  print df1 groupby   quot Name quot    quot City quot    size   reset index name  count          Name      City  count  0    Alice   Seattle      1  1      Bob   Seattle      2  2  Mallory  Portland      2  3  Mallory   Seattle      1  The difference between count and size is that size counts NaN values while count does not

User · Answer

grouped df groupby   Team   Year     W   count   reset index     team wins df pd DataFrame grouped   team wins df team wins df rename   W   Wins   axis 1   team wins df  Wins   team wins df  Wins   astype np int32   team wins df reset index    print team wins df

User · Answer

Maybe I misunderstand the question but if you want to convert the groupby back to a dataframe you can use  to frame    I wanted to reset the index when I did this so I included that part as well    example code unrelated to question  df   df  TIME   groupby df  Name    min   df   df to frame   df   df reset index level   Name   TIME

User · Answer

I have aggregated with Qty wise data and store to dataframe  almo grp data   pd DataFrame   Qty cnt    almo slt models data groupby    orderDate   Item   State Abv                Qty   sum     reset index

[python] Converting a Pandas GroupBy output from Series to DataFrame

Examples related to python

Examples related to pandas

Examples related to dataframe

Examples related to pandas-groupby

Examples related to multi-index