[python] Convert pandas timezone-aware DateTimeIndex to naive timestamp, but in certain timezone

You can use the function tz_localize to make a Timestamp or DateTimeIndex timezone aware, but how can you do the opposite: how can you convert a timezone aware Timestamp to a naive one, while preserving its timezone?

An example:

In [82]: t = pd.date_range(start="2013-05-18 12:00:00", periods=10, freq='s', tz="Europe/Brussels")

In [83]: t
Out[83]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-18 12:00:00, ..., 2013-05-18 12:00:09]
Length: 10, Freq: S, Timezone: Europe/Brussels

I could remove the timezone by setting it to None, but then the result is converted to UTC (12 o'clock became 10):

In [86]: t.tz = None

In [87]: t
Out[87]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-18 10:00:00, ..., 2013-05-18 10:00:09]
Length: 10, Freq: S, Timezone: None

Is there another way I can convert a DateTimeIndex to timezone naive, but while preserving the timezone it was set in?


Some context on the reason I am asking this: I want to work with timezone naive timeseries (to avoid the extra hassle with timezones, and I do not need them for the case I am working on).
But for some reason, I have to deal with a timezone-aware timeseries in my local timezone (Europe/Brussels). As all my other data are timezone naive (but represented in my local timezone), I want to convert this timeseries to naive to further work with it, but it also has to be represented in my local timezone (so just remove the timezone info, without converting the user-visible time to UTC).

I know the time is actually internal stored as UTC and only converted to another timezone when you represent it, so there has to be some kind of conversion when I want to "delocalize" it. For example, with the python datetime module you can "remove" the timezone like this:

In [119]: d = pd.Timestamp("2013-05-18 12:00:00", tz="Europe/Brussels")

In [120]: d
Out[120]: <Timestamp: 2013-05-18 12:00:00+0200 CEST, tz=Europe/Brussels>

In [121]: d.replace(tzinfo=None)
Out[121]: <Timestamp: 2013-05-18 12:00:00> 

So, based on this, I could do the following, but I suppose this will not be very efficient when working with a larger timeseries:

In [124]: t
Out[124]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-18 12:00:00, ..., 2013-05-18 12:00:09]
Length: 10, Freq: S, Timezone: Europe/Brussels

In [125]: pd.DatetimeIndex([i.replace(tzinfo=None) for i in t])
Out[125]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-18 12:00:00, ..., 2013-05-18 12:00:09]
Length: 10, Freq: None, Timezone: None

This question is related to python pandas datetime timezone

The answer is


The most important thing is add tzinfo when you define a datetime object.

from datetime import datetime, timezone
from tzinfo_examples import HOUR, Eastern
u0 = datetime(2016, 3, 13, 5, tzinfo=timezone.utc)
for i in range(4):
     u = u0 + i*HOUR
     t = u.astimezone(Eastern)
     print(u.time(), 'UTC =', t.time(), t.tzname())

Late contribution but just came across something similar in Python datetime and pandas give different timestamps for the same date.

If you have timezone-aware datetime in pandas, technically, tz_localize(None) changes the POSIX timestamp (that is used internally) as if the local time from the timestamp was UTC. Local in this context means local in the specified timezone. Ex:

import pandas as pd

t = pd.date_range(start="2013-05-18 12:00:00", periods=2, freq='H', tz="US/Central")
# DatetimeIndex(['2013-05-18 12:00:00-05:00', '2013-05-18 13:00:00-05:00'], dtype='datetime64[ns, US/Central]', freq='H')

t_loc = t.tz_localize(None)
# DatetimeIndex(['2013-05-18 12:00:00', '2013-05-18 13:00:00'], dtype='datetime64[ns]', freq='H')

# offset in seconds according to timezone:
(t_loc.values-t.values)//1e9
# array([-18000, -18000], dtype='timedelta64[ns]')

Note that this will leave you with strange things during DST transitions, e.g.

t = pd.date_range(start="2020-03-08 01:00:00", periods=2, freq='H', tz="US/Central")
(t.values[1]-t.values[0])//1e9
# numpy.timedelta64(3600,'ns')

t_loc = t.tz_localize(None)
(t_loc.values[1]-t_loc.values[0])//1e9
# numpy.timedelta64(7200,'ns')

In contrast, tz_convert(None) does not modify the internal timestamp, it just removes the tzinfo.

t_utc = t.tz_convert(None)
(t_utc.values-t.values)//1e9
# array([0, 0], dtype='timedelta64[ns]')

My bottom line would be: stick with timezone-aware datetime if you can or only use t.tz_convert(None) which doesn't modify the underlying POSIX timestamp. Just keep in mind that you're practically working with UTC then.

(Python 3.8.2 x64 on Windows 10, pandas v1.0.5.)


Setting the tz attribute of the index explicitly seems to work:

ts_utc = ts.tz_convert("UTC")
ts_utc.index.tz = None

The accepted solution does not work when there are multiple different timezones in a Series. It throws ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

The solution is to use the apply method.

Please see the examples below:

# Let's have a series `a` with different multiple timezones. 
> a
0    2019-10-04 16:30:00+02:00
1    2019-10-07 16:00:00-04:00
2    2019-09-24 08:30:00-07:00
Name: localized, dtype: object

> a.iloc[0]
Timestamp('2019-10-04 16:30:00+0200', tz='Europe/Amsterdam')

# trying the accepted solution
> a.dt.tz_localize(None)
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

# Make it tz-naive. This is the solution:
> a.apply(lambda x:x.tz_localize(None))
0   2019-10-04 16:30:00
1   2019-10-07 16:00:00
2   2019-09-24 08:30:00
Name: localized, dtype: datetime64[ns]

# a.tz_convert() also does not work with multiple timezones, but this works:
> a.apply(lambda x:x.tz_convert('America/Los_Angeles'))
0   2019-10-04 07:30:00-07:00
1   2019-10-07 13:00:00-07:00
2   2019-09-24 08:30:00-07:00
Name: localized, dtype: datetime64[ns, America/Los_Angeles]

I think you can't achieve what you want in a more efficient manner than you proposed.

The underlying problem is that the timestamps (as you seem aware) are made up of two parts. The data that represents the UTC time, and the timezone, tz_info. The timezone information is used only for display purposes when printing the timezone to the screen. At display time, the data is offset appropriately and +01:00 (or similar) is added to the string. Stripping off the tz_info value (using tz_convert(tz=None)) doesn't doesn't actually change the data that represents the naive part of the timestamp.

So, the only way to do what you want is to modify the underlying data (pandas doesn't allow this... DatetimeIndex are immutable -- see the help on DatetimeIndex), or to create a new set of timestamp objects and wrap them in a new DatetimeIndex. Your solution does the latter:

pd.DatetimeIndex([i.replace(tzinfo=None) for i in t])

For reference, here is the replace method of Timestamp (see tslib.pyx):

def replace(self, **kwds):
    return Timestamp(datetime.replace(self, **kwds),
                     offset=self.offset)

You can refer to the docs on datetime.datetime to see that datetime.datetime.replace also creates a new object.

If you can, your best bet for efficiency is to modify the source of the data so that it (incorrectly) reports the timestamps without their timezone. You mentioned:

I want to work with timezone naive timeseries (to avoid the extra hassle with timezones, and I do not need them for the case I am working on)

I'd be curious what extra hassle you are referring to. I recommend as a general rule for all software development, keep your timestamp 'naive values' in UTC. There is little worse than looking at two different int64 values wondering which timezone they belong to. If you always, always, always use UTC for the internal storage, then you will avoid countless headaches. My mantra is Timezones are for human I/O only.


Building on D.A.'s suggestion that "the only way to do what you want is to modify the underlying data" and using numpy to modify the underlying data...

This works for me, and is pretty fast:

def tz_to_naive(datetime_index):
    """Converts a tz-aware DatetimeIndex into a tz-naive DatetimeIndex,
    effectively baking the timezone into the internal representation.

    Parameters
    ----------
    datetime_index : pandas.DatetimeIndex, tz-aware

    Returns
    -------
    pandas.DatetimeIndex, tz-naive
    """
    # Calculate timezone offset relative to UTC
    timestamp = datetime_index[0]
    tz_offset = (timestamp.replace(tzinfo=None) - 
                 timestamp.tz_convert('UTC').replace(tzinfo=None))
    tz_offset_td64 = np.timedelta64(tz_offset)

    # Now convert to naive DatetimeIndex
    return pd.DatetimeIndex(datetime_index.values + tz_offset_td64)

Because I always struggle to remember, a quick summary of what each of these do:

>>> pd.Timestamp.now()  # naive local time
Timestamp('2019-10-07 10:30:19.428748')

>>> pd.Timestamp.utcnow()  # tz aware UTC
Timestamp('2019-10-07 08:30:19.428748+0000', tz='UTC')

>>> pd.Timestamp.now(tz='Europe/Brussels')  # tz aware local time
Timestamp('2019-10-07 10:30:19.428748+0200', tz='Europe/Brussels')

>>> pd.Timestamp.now(tz='Europe/Brussels').tz_localize(None)  # naive local time
Timestamp('2019-10-07 10:30:19.428748')

>>> pd.Timestamp.now(tz='Europe/Brussels').tz_convert(None)  # naive UTC
Timestamp('2019-10-07 08:30:19.428748')

>>> pd.Timestamp.utcnow().tz_localize(None)  # naive UTC
Timestamp('2019-10-07 08:30:19.428748')

>>> pd.Timestamp.utcnow().tz_convert(None)  # naive UTC
Timestamp('2019-10-07 08:30:19.428748')

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float

Examples related to datetime

Comparing two joda DateTime instances How to format DateTime in Flutter , How to get current time in flutter? How do I convert 2018-04-10T04:00:00.000Z string to DateTime? How to get current local date and time in Kotlin Converting unix time into date-time via excel Convert python datetime to timestamp in milliseconds SQL Server date format yyyymmdd Laravel Carbon subtract days from current date Check if date is a valid one Why is ZoneOffset.UTC != ZoneId.of("UTC")?

Examples related to timezone

How to set the timezone in Django? How to convert Moment.js date to users local timezone? What does this format means T00:00:00.000Z? How do I get the current timezone name in Postgres 9.3? MySQL JDBC Driver 5.1.33 - Time Zone Issue Python get current time in right timezone Symfony2 and date_default_timezone_get() - It is not safe to rely on the system's timezone settings PHP date() with timezone? How to store a datetime in MySQL with timezone info Java SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'") gives timezone as IST