I have the following dataframe:
Date abc xyz
01-Jun-13 100 200
03-Jun-13 -20 50
15-Aug-13 40 -5
20-Jan-14 25 15
21-Feb-14 60 80
I need to group the data by year and month. ie: Group by Jan 2013, Feb 2013, Mar 2013 etc... I will be using the newly grouped data to create a plot showing abc vs xyz per year/month.
I've tried various combinations of groupby and sum but just can't seem to get anything to work.
Thank you for any assistance.
You can also do it by creating a string column with the year and month as follows:
df['date'] = df.index
df['year-month'] = df['date'].apply(lambda x: str(x.year) + ' ' + str(x.month))
grouped = df.groupby('year-month')
However this doesn't preserve the order when you loop over the groups, e.g.
for name, group in grouped:
print(name)
Will give:
2007 11
2007 12
2008 1
2008 10
2008 11
2008 12
2008 2
2008 3
2008 4
2008 5
2008 6
2008 7
2008 8
2008 9
2009 1
2009 10
So then, if you want to preserve the order, you must do as suggested by @Q-man above:
grouped = df.groupby([df.index.year, df.index.month])
This will preserve the order in the above loop:
(2007, 11)
(2007, 12)
(2008, 1)
(2008, 2)
(2008, 3)
(2008, 4)
(2008, 5)
(2008, 6)
(2008, 7)
(2008, 8)
(2008, 9)
(2008, 10)
Why not keep it simple?!
GB=DF.groupby([(DF.index.year),(DF.index.month)]).sum()
giving you,
print(GB)
abc xyz
2013 6 80 250
8 40 -5
2014 1 25 15
2 60 80
and then you can plot like asked using,
GB.plot('abc','xyz',kind='scatter')
There are different ways to do that.
df = pd.DataFrame({'Date':['01-Jun-13','03-Jun-13', '15-Aug-13', '20-Jan-14', '21-Feb-14'],
'abc':[100,-20,40,25,60],'xyz':[200,50,-5,15,80] })
def getMonth(s): return s.split("-")[1] def getDay(s): return s.split("-")[0] def getYear(s): return s.split("-")[2] def getYearMonth(s): return s.split("-")[1]+"-"+s.split("-")[2]
year
, month
, day
and 'yearMonth
'. In your case, you need one of both. You can group using two columns 'year','month'
or using one column yearMonth
df['year']= df['Date'].apply(lambda x: getYear(x)) df['month']= df['Date'].apply(lambda x: getMonth(x)) df['day']= df['Date'].apply(lambda x: getDay(x)) df['YearMonth']= df['Date'].apply(lambda x: getYearMonth(x))
Output:
Date abc xyz year month day YearMonth
0 01-Jun-13 100 200 13 Jun 01 Jun-13
1 03-Jun-13 -20 50 13 Jun 03 Jun-13
2 15-Aug-13 40 -5 13 Aug 15 Aug-13
3 20-Jan-14 25 15 14 Jan 20 Jan-14
4 21-Feb-14 60 80 14 Feb 21 Feb-14
In this case, we are grouping by two columns:
for key,g in df.groupby(['year','month']): print key,g
Output:
('13', 'Jun') Date abc xyz year month day YearMonth
0 01-Jun-13 100 200 13 Jun 01 Jun-13
1 03-Jun-13 -20 50 13 Jun 03 Jun-13
('13', 'Aug') Date abc xyz year month day YearMonth
2 15-Aug-13 40 -5 13 Aug 15 Aug-13
('14', 'Jan') Date abc xyz year month day YearMonth
3 20-Jan-14 25 15 14 Jan 20 Jan-14
('14', 'Feb') Date abc xyz year month day YearMonth
In this case, we are grouping by one column:
for key,g in df.groupby(['YearMonth']): print key,g
Output:
Jun-13 Date abc xyz year month day YearMonth
0 01-Jun-13 100 200 13 Jun 01 Jun-13
1 03-Jun-13 -20 50 13 Jun 03 Jun-13
Aug-13 Date abc xyz year month day YearMonth
2 15-Aug-13 40 -5 13 Aug 15 Aug-13
Jan-14 Date abc xyz year month day YearMonth
3 20-Jan-14 25 15 14 Jan 20 Jan-14
Feb-14 Date abc xyz year month day YearMonth
4 21-Feb-14 60 80 14 Feb 21 Feb-14
get_group
print df.groupby(['YearMonth']).get_group('Jun-13')
Output:
Date abc xyz year month day YearMonth
0 01-Jun-13 100 200 13 Jun 01 Jun-13
1 03-Jun-13 -20 50 13 Jun 03 Jun-13
get_group
. This hack would help to filter values and get the grouped values.This also would give the same result.
print df[df['YearMonth']=='Jun-13']
Output:
Date abc xyz year month day YearMonth
0 01-Jun-13 100 200 13 Jun 01 Jun-13
1 03-Jun-13 -20 50 13 Jun 03 Jun-13
You can select list of abc
or xyz
values during Jun-13
print df[df['YearMonth']=='Jun-13'].abc.values
print df[df['YearMonth']=='Jun-13'].xyz.values
Output:
[100 -20] #abc values
[200 50] #xyz values
You can use this to go through the dates that you have classified as "year-month" and apply cretiria on it to get related data.
for x in set(df.YearMonth):
print df[df['YearMonth']==x].abc.values
print df[df['YearMonth']==x].xyz.values
I recommend also to check this answer as well.
Source: Stackoverflow.com