Pandas groupby month and year

Question

I have the following dataframe   Date        abc    xyz 01-Jun-13   100    200 03-Jun-13   -20    50 15-Aug-13   40     -5 20-Jan-14   25     15 21-Feb-14   60     80   I need to group the data by year and month  ie  Group by Jan 2013  Feb 2013  Mar 2013 etc    I will be using the newly grouped data to create a plot showing abc vs xyz per year month   I ve tried various combinations of groupby and sum but just can t seem to get anything to work   Thank you for any assistance

User · Answer

You can also do it by creating a string column with the year and month as follows:

df['date'] = df.index
df['year-month'] = df['date'].apply(lambda x: str(x.year) + ' ' + str(x.month))
grouped = df.groupby('year-month')

However this doesn't preserve the order when you loop over the groups, e.g.

for name, group in grouped:
    print(name)

Will give:

So then, if you want to preserve the order, you must do as suggested by @Q-man above:

grouped = df.groupby([df.index.year, df.index.month])

This will preserve the order in the above loop:

(2007, 11)
(2007, 12)
(2008, 1)
(2008, 2)
(2008, 3)
(2008, 4)
(2008, 5)
(2008, 6)
(2008, 7)
(2008, 8)
(2008, 9)
(2008, 10)

User · Answer

You can use either resample or Grouper  which resamples under the hood    First make sure that the datetime column is actually of datetimes  hit it with pd to datetime   It s easier if it s a DatetimeIndex   In  11   df1 Out 11               abc  xyz Date 2013-06-01  100  200 2013-06-03  -20   50 2013-08-15   40   -5 2014-01-20   25   15 2014-02-21   60   80  In  12   g   df1 groupby pd Grouper freq  M        DataFrameGroupBy  grouped by Month   In  13   g sum   Out 13               abc  xyz Date 2013-06-30   80  250 2013-07-31  NaN  NaN 2013-08-31   40   -5 2013-09-30  NaN  NaN 2013-10-31  NaN  NaN 2013-11-30  NaN  NaN 2013-12-31  NaN  NaN 2014-01-31   25   15 2014-02-28   60   80  In  14   df1 resample  M   how  sum       the same Out 14               abc  xyz Date 2013-06-30   40  125 2013-07-31  NaN  NaN 2013-08-31   40   -5 2013-09-30  NaN  NaN 2013-10-31  NaN  NaN 2013-11-30  NaN  NaN 2013-12-31  NaN  NaN 2014-01-31   25   15 2014-02-28   60   80   Note  Previously pd Grouper freq  M   was written as pd TimeGrouper  M    The latter is now deprecated since 0 21     I had thought the following would work  but it doesn t  due to as index not being respected  I m not sure    I m including this for interest s sake   If it s a column  it has to be a datetime64 column  as I say  hit it with to datetime   you can use the PeriodIndex   In  21   df Out 21           Date  abc  xyz 0 2013-06-01  100  200 1 2013-06-03  -20   50 2 2013-08-15   40   -5 3 2014-01-20   25   15 4 2014-02-21   60   80  In  22   pd DatetimeIndex df Date  to period  M      old way Out 22    lt class  pandas tseries period PeriodIndex  gt   2013-06       2014-02  Length  5  Freq  M  In  23   per   df Date dt to period  M      new way to get the same  In  24   g   df groupby per   In  25   g sum      dang not quite what we want  doesn t fill in the gaps  Out 25            abc  xyz 2013-06   80  250 2013-08   40   -5 2014-01   25   15 2014-02   60   80   To get the desired result we have to reindex

User · Answer

Why not keep it simple    GB DF groupby   DF index year   DF index month    sum     giving you   print GB          abc  xyz 2013 6   80  250      8   40   -5 2014 1   25   15      2   60   80   and then you can plot like asked using   GB plot  abc   xyz  kind  scatter

User · Answer

There are different ways to do that     I created the data frame to showcase the different techniques to filter your data      df   pd DataFrame   Date    01-Jun-13   03-Jun-13    15-Aug-13    20-Jan-14    21-Feb-14           abc   100 -20 40 25 60   xyz   200 50 -5 15 80        I separated months year day and seperated month-year as you explained     def getMonth s     return s split  -   1   def getDay s     return s split  -   0   def getYear s     return s split  -   2   def getYearMonth s     return s split  -   1   -  s split  -   2      I created new columns  year  month  day and  yearMonth   In your case  you need one of both  You can group using two columns  year   month  or using one column yearMonth    df  year    df  Date   apply lambda x  getYear x   df  month    df  Date   apply lambda x  getMonth x   df  day    df  Date   apply lambda x  getDay x   df  YearMonth    df  Date   apply lambda x  getYearMonth x      Output            Date  abc  xyz year month day YearMonth 0  01-Jun-13  100  200   13   Jun  01    Jun-13 1  03-Jun-13  -20   50   13   Jun  03    Jun-13 2  15-Aug-13   40   -5   13   Aug  15    Aug-13 3  20-Jan-14   25   15   14   Jan  20    Jan-14 4  21-Feb-14   60   80   14   Feb  21    Feb-14    You can go through the different groups in groupby     items     In this case  we are grouping by two columns     for key g in df groupby   year   month         print key g    Output      13    Jun           Date  abc  xyz year month day YearMonth 0  01-Jun-13  100  200   13   Jun  01    Jun-13 1  03-Jun-13  -20   50   13   Jun  03    Jun-13   13    Aug           Date  abc  xyz year month day YearMonth 2  15-Aug-13   40   -5   13   Aug  15    Aug-13   14    Jan           Date  abc  xyz year month day YearMonth 3  20-Jan-14   25   15   14   Jan  20    Jan-14   14    Feb           Date  abc  xyz year month day YearMonth   In this case  we are grouping by one column     for key g in df groupby   YearMonth         print key g    Output    Jun-13         Date  abc  xyz year month day YearMonth 0  01-Jun-13  100  200   13   Jun  01    Jun-13 1  03-Jun-13  -20   50   13   Jun  03    Jun-13 Aug-13         Date  abc  xyz year month day YearMonth 2  15-Aug-13   40   -5   13   Aug  15    Aug-13 Jan-14         Date  abc  xyz year month day YearMonth 3  20-Jan-14   25   15   14   Jan  20    Jan-14 Feb-14         Date  abc  xyz year month day YearMonth 4  21-Feb-14   60   80   14   Feb  21    Feb-14    In case you wanna access to specific item  you can use get group      print df groupby   YearMonth    get group  Jun-13     Output            Date  abc  xyz year month day YearMonth 0  01-Jun-13  100  200   13   Jun  01    Jun-13 1  03-Jun-13  -20   50   13   Jun  03    Jun-13    Similar to get group  This hack would help to filter values and get the grouped values    This also would give the same result   print df df  YearMonth     Jun-13      Output           Date  abc  xyz year month day YearMonth 0  01-Jun-13  100  200   13   Jun  01    Jun-13 1  03-Jun-13  -20   50   13   Jun  03    Jun-13   You can select list of abc or xyz values during Jun-13   print df df  YearMonth     Jun-13   abc values print df df  YearMonth     Jun-13   xyz values   Output     100 -20    abc values  200  50    xyz values   You can use this to go through the dates that you have classified as  year-month  and apply cretiria on it to get related data    for x in set df YearMonth        print df df  YearMonth    x  abc values     print df df  YearMonth    x  xyz values   I recommend also to check this answer as well

[python] Pandas groupby month and year

Examples related to python

Examples related to pandas