[r] Extract year from date

How can I remove the first elements from a variable, especially if this variable has a special characters. For example, I have the following column:

Date
01/01/2009
01/01/2010
01/01/2011
01/01/2012

I need to have a new column like the following:

Date
2009
2010
2011
2012

This question is related to r date

The answer is


As discussed in the comments, this can be achieved by converting the entry into Date format and extracting the year, for instance like this:

format(as.Date(df1$Date, format="%d/%m/%Y"),"%Y")

If you are using the date package, this can be done fairly easily.

library(date)
Date <- c("01/01/2009", "01/01/2010", "01/01/2011", "01/01/2012")
Date <- as.date(Date)
Date
# [1] 1Jan2009 1Jan2010 1Jan2011 1Jan2012
date.mdy(Date)$year
# [1] 2009 2010 2011 2012

## be aware that these are now integers and thus different methods may be invoked:
str(date.mdy(Date)$year)
# int [1:4] 2009 2010 2011 2012
summary(Date)
#     First      Last   
# "1Jan2009" "1Jan2012" 
summary(date.mdy(Date)$year)
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#    2009    2010    2010    2010    2011    2012 

For some time now, you can also only rely on the data.table package and its IDate class plus associated functions (Check ?as.IDate()). So, no need to additionally install lubridate.

require(data.table)

a <- c("01/01/2009", "01/01/2010" , "01/01/2011")
year(as.IDate(a, '%d/%m/%Y')) # all data.table functions


When you convert your variable to Date:

date <-  as.Date('10/30/2018','%m/%d/%Y')

you can then cut out the elements you want and make new variables, like year:

year <- as.numeric(format(date,'%Y'))

or month:

month <- as.numeric(format(date,'%m'))

This is more advice than a specific answer, but my suggestion is to convert dates to date variables immediately, rather than keeping them as strings. This way you can use date (and time) functions on them, rather than trying to use very troublesome workarounds.

As pointed out, the lubridate package has nice extraction functions.

For some projects, I have found that piecing dates out from the start is helpful: create year, month, day (of month) and day (of week) variables to start with. This can simplify summaries, tables and graphs, because the extraction code is separate from the summary/table/graph code, and because if you need to change it, you don't have to roll out those changes in multiple spots.