dt
accessorA common source of confusion revolves around when to use .year
and when to use .dt.year
.
The former is an attribute for pd.DatetimeIndex
objects; the latter for pd.Series
objects. Consider this dataframe:
df = pd.DataFrame({'Dates': pd.to_datetime(['2018-01-01', '2018-10-20', '2018-12-25'])},
index=pd.to_datetime(['2000-01-01', '2000-01-02', '2000-01-03']))
The definition of the series and index look similar, but the pd.DataFrame
constructor converts them to different types:
type(df.index) # pandas.tseries.index.DatetimeIndex
type(df['Dates']) # pandas.core.series.Series
The DatetimeIndex
object has a direct year
attribute, while the Series
object must use the dt
accessor. Similarly for month
:
df.index.month # array([1, 1, 1])
df['Dates'].dt.month.values # array([ 1, 10, 12], dtype=int64)
A subtle but important difference worth noting is that df.index.month
gives a NumPy array, while df['Dates'].dt.month
gives a Pandas series. Above, we use pd.Series.values
to extract the NumPy array representation.