I have an array of datetime64 type:
dates = np.datetime64(['2010-10-17', '2011-05-13', "2012-01-15"])
Is there a better way than looping through each element just to get np.array of years:
years = f(dates)
#output:
array([2010, 2011, 2012], dtype=int8) #or dtype = string
I'm using stable numpy version 1.6.2.
As datetime is not stable in numpy I would use pandas for this:
In [52]: import pandas as pd
In [53]: dates = pd.DatetimeIndex(['2010-10-17', '2011-05-13', "2012-01-15"])
In [54]: dates.year
Out[54]: array([2010, 2011, 2012], dtype=int32)
Pandas uses numpy datetime internally, but seems to avoid the shortages, that numpy has up to now.
Anon's answer works great for me, but I just need to modify the statement for days
from:
days = dates - dates.astype('datetime64[M]') + 1
to:
days = dates.astype('datetime64[D]') - dates.astype('datetime64[M]') + 1
If you upgrade to numpy 1.7 (where datetime is still labeled as experimental) the following should work.
dates/np.timedelta64(1,'Y')
Another possibility is:
np.datetime64(dates,'Y') - returns - numpy.datetime64('2010')
or
np.datetime64(dates,'Y').astype(int)+1970 - returns - 2010
but works only on scalar values, won't take array
There's no direct way to do it yet, unfortunately, but there are a couple indirect ways:
[dt.year for dt in dates.astype(object)]
or
[datetime.datetime.strptime(repr(d), "%Y-%m-%d %H:%M:%S").year for d in dates]
both inspired by the examples here.
Both of these work for me on Numpy 1.6.1. You may need to be a bit more careful with the second one, since the repr() for the datetime64 might have a fraction part after a decimal point.
Using numpy version 1.10.4 and pandas version 0.17.1,
dates = np.array(['2010-10-17', '2011-05-13', '2012-01-15'], dtype=np.datetime64)
pd.to_datetime(dates).year
I get what you're looking for:
array([2010, 2011, 2012], dtype=int32)
Use dates.tolist()
to convert to native datetime objects, then simply access year
. Example:
>>> dates = np.array(['2010-10-17', '2011-05-13', '2012-01-15'], dtype='datetime64')
>>> [x.year for x in dates.tolist()]
[2010, 2011, 2012]
This is basically the same idea exposed in https://stackoverflow.com/a/35281829/2192272, but using simpler syntax.
Tested with python 3.6 / numpy 1.18.
This is how I do it.
import numpy as np
def dt2cal(dt):
"""
Convert array of datetime64 to a calendar array of year, month, day, hour,
minute, seconds, microsecond with these quantites indexed on the last axis.
Parameters
----------
dt : datetime64 array (...)
numpy.ndarray of datetimes of arbitrary shape
Returns
-------
cal : uint32 array (..., 7)
calendar array with last axis representing year, month, day, hour,
minute, second, microsecond
"""
# allocate output
out = np.empty(dt.shape + (7,), dtype="u4")
# decompose calendar floors
Y, M, D, h, m, s = [dt.astype(f"M8[{x}]") for x in "YMDhms"]
out[..., 0] = Y + 1970 # Gregorian Year
out[..., 1] = (M - Y) + 1 # month
out[..., 2] = (D - M) + 1 # dat
out[..., 3] = (dt - D).astype("m8[h]") # hour
out[..., 4] = (dt - h).astype("m8[m]") # minute
out[..., 5] = (dt - m).astype("m8[s]") # second
out[..., 6] = (dt - s).astype("m8[us]") # microsecond
return out
It's vectorized across arbitrary input dimensions, it's fast, its intuitive, it works on numpy v1.15.4, it doesn't use pandas.
I really wish numpy supported this functionality, it's required all the time in application development. I always get super nervous when I have to roll my own stuff like this, I always feel like I'm missing an edge case.
There should be an easier way to do this, but, depending on what you're trying to do, the best route might be to convert to a regular Python datetime object:
datetime64Obj = np.datetime64('2002-07-04T02:55:41-0700')
print datetime64Obj.astype(object).year
# 2002
print datetime64Obj.astype(object).day
# 4
Based on comments below, this seems to only work in Python 2.7.x and Python 3.6+
I find the following tricks give between 2x and 4x speed increase versus the pandas method described above (i.e. pd.DatetimeIndex(dates).year
etc.). The speed of [dt.year for dt in dates.astype(object)]
I find to be similar to the pandas method. Also these tricks can be applied directly to ndarrays of any shape (2D, 3D etc.)
dates = np.arange(np.datetime64('2000-01-01'), np.datetime64('2010-01-01'))
years = dates.astype('datetime64[Y]').astype(int) + 1970
months = dates.astype('datetime64[M]').astype(int) % 12 + 1
days = dates - dates.astype('datetime64[M]') + 1
Source: Stackoverflow.com