How to group pandas DataFrame entries by date in a non-unique column

Question

A Pandas DataFrame contains column named  date  that contains non-unique datetime values   I can group the lines in this frame using   data groupby data  date      However  this splits the data by the datetime values  I would like to group these data by the year stored in the  date  column  This page shows how to group by year in cases where the time stamp is used as an index  which is not true in my case   How do I achieve this grouping

User · Answer

this will also work  data groupby data  date   dt year

User · Answer

This should work   data groupby lambda x  data  date   x  year

User · Answer

I m using pandas 0 16 2  This has better performance on my large dataset   data groupby data date dt year    Using the dt option and playing around with weekofyear  dayofweek etc  becomes far easier

User · Answer

ecatmur s solution will work fine  This will be better performance on large datasets  though   data groupby data  date   map lambda x  x year

User · Answer

This might be easier to explain with a sample dataset   Create Sample Data  Let s assume we have a single column of Timestamps  date and another column we would like to perform an aggregation on  a   df   pd DataFrame   date  pd DatetimeIndex   2012-1-1    2012-6-1    2015-1-1    2015-2-1    2015-3-1                         a   9 5 1 2 3    columns   date    a     df          date  a 0 2012-01-01  9 1 2012-06-01  5 2 2015-01-01  1 3 2015-02-01  2 4 2015-03-01  3   There are several ways to group by year   Use the dt accessor with year property Put date in index and use anonymous function to access year Use resample method Convert to pandas Period    dt accessor with year property  When you have a column  and not an index  of pandas Timestamps  you can access many more extra properties and methods with the dt accessor  For instance   df  date   dt year  0    2012 1    2012 2    2015 3    2015 4    2015 Name  date  dtype  int64   We can use this to form our groups and calculate some aggregations on a particular column   df groupby df  date   dt year   a   agg   sum    mean    max           sum  mean  max date                 2012   14     7    9 2015    6     2    3     put date in index and use anonymous function to access year  If you set the date column as the index  it becomes a DateTimeIndex with the same properties and methods as the dt accessor gives normal columns  df1   df set index  date   df1 index year  Int64Index  2012  2012  2015  2015  2015   dtype  int64   name  date     Interestingly  when using the groupby method  you can pass it a function  This function will be implicitly passed the DataFrame s index  So  we can get the same result from above with the following   df1 groupby lambda x  x year   a   agg   sum    mean    max           sum  mean  max 2012   14     7    9 2015    6     2    3     Use the resample method  If your date column is not in the index  you must specify the column with the on parameter  You also need to specify the offset alias as a string   df resample  AS   on  date    a   agg   sum    mean    max                  sum  mean  max date                        2012-01-01  14 0   7 0  9 0 2013-01-01   NaN   NaN  NaN 2014-01-01   NaN   NaN  NaN 2015-01-01   6 0   2 0  3 0     Convert to pandas Period  You can also convert the date column to a pandas Period object  We must pass in the offset alias as a string to determine the length of the Period   df  date   dt to period  A    0   2012 1   2012 2   2015 3   2015 4   2015 Name  date  dtype  object   We can then use this as a group  df groupby df  date   dt to period  Y     a   agg   sum    mean    max            sum  mean  max 2012   14     7    9 2015    6     2    3

[python] How to group pandas DataFrame entries by date in a non-unique column

Examples related to python

Examples related to pandas