[r] calculating number of days between 2 columns of dates in data frame

I have a data frame which has two columns of dates in the format yyyy/mm/dd. I am trying to calculate the number of days between these two dates for each observation within the data frame (and create a new variable with this number of days in it).

So far I have tried using the answer given here:

Calculate the number of weekdays between 2 dates in R

but editing the code so that it calculates number of total days rather than just the number of week days. This just came up with error saying:

Error in del/by : non-numeric argument to binary operator
In addition: Warning message:
In Ops.factor(to, from) : - not meaningful for factors

I have also attempted using this code:

finish <- as.Date(survey$date, format="%yyyy/%mm/%dd")
start <- as.Date(survey$tx_start, format="%yyyy/%mm/%dd")
date_diff<-as.data.frame(finish-start)

with the plan of using "cbind" or something equivalent to combine the data frame "days" to my data frame "survey" which has data in it.

Although this does not give me any errors, the observations within the "finish" and "start" objects are all "NA_real_", and the date_diff data frame therefore has all the observations listed as NA.

If someone could point me in the right direction that would be great! All the other question I have found do not seem to be dealing with dates within variables but as individual dates, and applying those techniques to variables hasn`t been working for me.

This question is related to r date

The answer is


Without your seeing your data (you can use the output of dput(head(survey)) to show us) this is a shot in the dark:

survey <- data.frame(date=c("2012/07/26","2012/07/25"),tx_start=c("2012/01/01","2012/01/01"))

survey$date_diff <- as.Date(as.character(survey$date), format="%Y/%m/%d")-
                  as.Date(as.character(survey$tx_start), format="%Y/%m/%d")
survey
       date   tx_start date_diff
1 2012/07/26 2012/01/01  207 days
2 2012/07/25 2012/01/01  206 days

You need to use the as.Date formats correctly.

Eg.

x = '2012/07/25'
xd = as.Date(x,'%Y/%m/%d')
xd    # Prints "2012-07-25"

R date formats are similary to *nix ones.

Doing a typeof(xd) shows it as a double ie. days since 1970.


Following Ronald Example I would like to add that it should be considered if the origin and end dates must be included or not in the days count between two dates. I faced the same problem and ended up using a third option with apply. It could be memory inefficient but helps to understand the problem:

   survey <- data.frame(date=c("2012/07/26","2012/07/25"),tx_start=c("2012/01/01","2012/01/01"))

survey$diff_1 <- as.numeric(
  as.Date(as.character(survey$date), format="%Y/%m/%d")-
    as.Date(as.character(survey$tx_start), format="%Y/%m/%d")
)

survey$diff_2<- as.numeric(
  difftime(survey$date ,survey$tx_start , units = c("days"))
)

survey$diff_3 <- apply(X = survey[,c("date", "tx_start")],
                       MARGIN = 1,
                       FUN = function(x)
                         length(
                           seq.Date(
                             from = as.Date(x[2]),
                             to = as.Date(x[1]),
                             by = "day")
                           )
                       )

This gives the following date differences:

        date   tx_start diff_1   diff_2 diff_3
1 2012/07/26 2012/01/01    207 206.9583    208
2 2012/07/25 2012/01/01    206 205.9583    207

In Ronald's example, if the date formats are different (as displayed below) then modify the format parameter

survey <- data.frame(date=c("2012-07-26","2012-07-25"),tx_start=c("2012-01-01","2012-01-01"))

survey$date_diff <- as.Date(as.character(survey$date), format="%Y-%m-%d")-
              as.Date(as.character(survey$tx_start), format="%Y-%m-%d")

survey:

   date      tx_start     date_diff
1 2012-07-26     2012-01-01    207 days
2 2012-07-25     2012-01-01    206 days

You could find the difference between dates in columns in a data frame by using the function difftime as follows:

df$diff_in_days<- difftime(df$datevar1 ,df$datevar2 , units = c("days"))