[python] Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes

I have two columns, fromdate and todate, in a dataframe.

import pandas as pd

data = {'todate': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')],
        'fromdate': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}

df = pd.DataFrame(data)

I add a new column, diff, to find the difference between the two dates using

df['diff'] = df['fromdate'] - df['todate']

I get the diff column, but it contains days, when there's more than 24 hours.

                   todate                fromdate                   diff
0 2014-01-24 13:03:12.050 2014-01-26 23:41:21.870 2 days 10:38:09.820000
1 2014-01-27 11:57:18.240 2014-01-27 15:38:22.540 0 days 03:41:04.300000
2 2014-01-23 10:07:47.660 2014-01-23 18:50:41.420 0 days 08:42:53.760000

How do I convert my results to only hours and minutes (i.e. days are converted to hours)?

This question is related to python pandas datetime python-datetime

The answer is


  • How do I convert my results to only hours and minutes
    • The accepted answer only returns days + hours. Minutes are not included.
  • To provide a column that has hours and minutes, as hh:mm or x hours y minutes, would require additional calculations and string formatting.
  • This answer shows how to get either total hours or total minutes as a float, using timedelta math, and is faster than using .astype('timedelta64[h]')
  • Pandas Time Deltas User Guide
  • Pandas Time series / date functionality User Guide
  • python timedelta objects: See supported operations.
  • The following sample data is already a datetime64[ns] dtype. It is required that all relevant columns are converted using pandas.to_datetime().
import pandas as pd

# test data from OP, with values already in a datetime format
data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')],
        'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}

# test dataframe; the columns must be in a datetime format; use pandas.to_datetime if needed
df = pd.DataFrame(data)

# add a timedelta column if wanted. It's added here for information only
# df['time_delta_with_sub'] = df.from_date.sub(df.to_date)  # also works
df['time_delta'] = (df.from_date - df.to_date)

# create a column with timedelta as total hours, as a float type
df['tot_hour_diff'] = (df.from_date - df.to_date) / pd.Timedelta(hours=1)

# create a colume with timedelta as total minutes, as a float type
df['tot_mins_diff'] = (df.from_date - df.to_date) / pd.Timedelta(minutes=1)

# display(df)
                  to_date               from_date             time_delta  tot_hour_diff  tot_mins_diff
0 2014-01-24 13:03:12.050 2014-01-26 23:41:21.870 2 days 10:38:09.820000      58.636061    3518.163667
1 2014-01-27 11:57:18.240 2014-01-27 15:38:22.540 0 days 03:41:04.300000       3.684528     221.071667
2 2014-01-23 10:07:47.660 2014-01-23 18:50:41.420 0 days 08:42:53.760000       8.714933     522.896000

Other methods

  • An item of note from the podcast in Other Resources, .total_seconds() was added and merged when the core developer was on vacation, and would not have been approved.
    • This is also why there aren't other .total_xx methods.
# convert the entire timedelta to seconds
# this is the same as td / timedelta(seconds=1)
(df.from_date - df.to_date).dt.total_seconds()
[out]:
0    211089.82
1     13264.30
2     31373.76
dtype: float64

# get the number of days
(df.from_date - df.to_date).dt.days
[out]:
0    2
1    0
2    0
dtype: int64

# get the seconds for hours + minutes + seconds, but not days
# note the difference from total_seconds
(df.from_date - df.to_date).dt.seconds
[out]:
0    38289
1    13264
2    31373
dtype: int64

Other Resources

%%timeit test

import pandas as pd

# dataframe with 2M rows
data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000')], 'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000')]}
df = pd.DataFrame(data)
df = pd.concat([df] * 1000000).reset_index(drop=True)

%%timeit
(df.from_date - df.to_date) / pd.Timedelta(hours=1)
[out]:
43.1 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
(df.from_date - df.to_date).astype('timedelta64[h]')
[out]:
59.8 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

This was driving me bonkers as the .astype() solution above didn't work for me. But I found another way. Haven't timed it or anything, but might work for others out there:

t1 = pd.to_datetime('1/1/2015 01:00')
t2 = pd.to_datetime('1/1/2015 03:30')

print pd.Timedelta(t2 - t1).seconds / 3600.0

...if you want hours. Or:

print pd.Timedelta(t2 - t1).seconds / 60.0

...if you want minutes.


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to datetime

Comparing two joda DateTime instances How to format DateTime in Flutter , How to get current time in flutter? How do I convert 2018-04-10T04:00:00.000Z string to DateTime? How to get current local date and time in Kotlin Converting unix time into date-time via excel Convert python datetime to timestamp in milliseconds SQL Server date format yyyymmdd Laravel Carbon subtract days from current date Check if date is a valid one Why is ZoneOffset.UTC != ZoneId.of("UTC")?

Examples related to python-datetime

Getting today's date in YYYY-MM-DD in Python? Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes Convert DataFrame column type from string to datetime, dd/mm/yyyy format Python datetime strptime() and strftime(): how to preserve the timezone information How to convert a time string to seconds? Convert a python UTC datetime to a local datetime using only python standard library?