[python] Pandas groupby month and year

I have the following dataframe:

Date        abc    xyz
01-Jun-13   100    200
03-Jun-13   -20    50
15-Aug-13   40     -5
20-Jan-14   25     15
21-Feb-14   60     80

I need to group the data by year and month. ie: Group by Jan 2013, Feb 2013, Mar 2013 etc... I will be using the newly grouped data to create a plot showing abc vs xyz per year/month.

I've tried various combinations of groupby and sum but just can't seem to get anything to work.

Thank you for any assistance.

This question is related to python pandas

The answer is


You can also do it by creating a string column with the year and month as follows:

df['date'] = df.index
df['year-month'] = df['date'].apply(lambda x: str(x.year) + ' ' + str(x.month))
grouped = df.groupby('year-month')

However this doesn't preserve the order when you loop over the groups, e.g.

for name, group in grouped:
    print(name)

Will give:

2007 11
2007 12
2008 1
2008 10
2008 11
2008 12
2008 2
2008 3
2008 4
2008 5
2008 6
2008 7
2008 8
2008 9
2009 1
2009 10

So then, if you want to preserve the order, you must do as suggested by @Q-man above:

grouped = df.groupby([df.index.year, df.index.month])

This will preserve the order in the above loop:

(2007, 11)
(2007, 12)
(2008, 1)
(2008, 2)
(2008, 3)
(2008, 4)
(2008, 5)
(2008, 6)
(2008, 7)
(2008, 8)
(2008, 9)
(2008, 10)

Why not keep it simple?!

GB=DF.groupby([(DF.index.year),(DF.index.month)]).sum()

giving you,

print(GB)
        abc  xyz
2013 6   80  250
     8   40   -5
2014 1   25   15
     2   60   80

and then you can plot like asked using,

GB.plot('abc','xyz',kind='scatter')

There are different ways to do that.

  • I created the data frame to showcase the different techniques to filter your data.
df = pd.DataFrame({'Date':['01-Jun-13','03-Jun-13', '15-Aug-13', '20-Jan-14', '21-Feb-14'],

'abc':[100,-20,40,25,60],'xyz':[200,50,-5,15,80] })

  • I separated months/year/day and seperated month-year as you explained.
def getMonth(s):
  return s.split("-")[1]

def getDay(s):
  return s.split("-")[0]

def getYear(s):
  return s.split("-")[2]

def getYearMonth(s):
  return s.split("-")[1]+"-"+s.split("-")[2]
  • I created new columns: year, month, day and 'yearMonth'. In your case, you need one of both. You can group using two columns 'year','month' or using one column yearMonth
df['year']= df['Date'].apply(lambda x: getYear(x))
df['month']= df['Date'].apply(lambda x: getMonth(x))
df['day']= df['Date'].apply(lambda x: getDay(x))
df['YearMonth']= df['Date'].apply(lambda x: getYearMonth(x))

Output:

        Date  abc  xyz year month day YearMonth
0  01-Jun-13  100  200   13   Jun  01    Jun-13
1  03-Jun-13  -20   50   13   Jun  03    Jun-13
2  15-Aug-13   40   -5   13   Aug  15    Aug-13
3  20-Jan-14   25   15   14   Jan  20    Jan-14
4  21-Feb-14   60   80   14   Feb  21    Feb-14
  • You can go through the different groups in groupby(..) items.

In this case, we are grouping by two columns:

for key,g in df.groupby(['year','month']):
    print key,g

Output:

('13', 'Jun')         Date  abc  xyz year month day YearMonth
0  01-Jun-13  100  200   13   Jun  01    Jun-13
1  03-Jun-13  -20   50   13   Jun  03    Jun-13
('13', 'Aug')         Date  abc  xyz year month day YearMonth
2  15-Aug-13   40   -5   13   Aug  15    Aug-13
('14', 'Jan')         Date  abc  xyz year month day YearMonth
3  20-Jan-14   25   15   14   Jan  20    Jan-14
('14', 'Feb')         Date  abc  xyz year month day YearMonth

In this case, we are grouping by one column:

for key,g in df.groupby(['YearMonth']):
    print key,g

Output:

Jun-13         Date  abc  xyz year month day YearMonth
0  01-Jun-13  100  200   13   Jun  01    Jun-13
1  03-Jun-13  -20   50   13   Jun  03    Jun-13
Aug-13         Date  abc  xyz year month day YearMonth
2  15-Aug-13   40   -5   13   Aug  15    Aug-13
Jan-14         Date  abc  xyz year month day YearMonth
3  20-Jan-14   25   15   14   Jan  20    Jan-14
Feb-14         Date  abc  xyz year month day YearMonth
4  21-Feb-14   60   80   14   Feb  21    Feb-14
  • In case you wanna access to specific item, you can use get_group

print df.groupby(['YearMonth']).get_group('Jun-13')

Output:

        Date  abc  xyz year month day YearMonth
0  01-Jun-13  100  200   13   Jun  01    Jun-13
1  03-Jun-13  -20   50   13   Jun  03    Jun-13
  • Similar to get_group. This hack would help to filter values and get the grouped values.

This also would give the same result.

print df[df['YearMonth']=='Jun-13'] 

Output:

        Date  abc  xyz year month day YearMonth
0  01-Jun-13  100  200   13   Jun  01    Jun-13
1  03-Jun-13  -20   50   13   Jun  03    Jun-13

You can select list of abc or xyz values during Jun-13

print df[df['YearMonth']=='Jun-13'].abc.values
print df[df['YearMonth']=='Jun-13'].xyz.values

Output:

[100 -20]  #abc values
[200  50]  #xyz values

You can use this to go through the dates that you have classified as "year-month" and apply cretiria on it to get related data.

for x in set(df.YearMonth): 
    print df[df['YearMonth']==x].abc.values
    print df[df['YearMonth']==x].xyz.values

I recommend also to check this answer as well.