You can use functions in pyspark.sql.functions
: functions like year
, month
, etc
refer to here: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame
from pyspark.sql.functions import *
newdf = elevDF.select(year(elevDF.date).alias('dt_year'), month(elevDF.date).alias('dt_month'), dayofmonth(elevDF.date).alias('dt_day'), dayofyear(elevDF.date).alias('dt_dayofy'), hour(elevDF.date).alias('dt_hour'), minute(elevDF.date).alias('dt_min'), weekofyear(elevDF.date).alias('dt_week_no'), unix_timestamp(elevDF.date).alias('dt_int'))
newdf.show()
+-------+--------+------+---------+-------+------+----------+----------+
|dt_year|dt_month|dt_day|dt_dayofy|dt_hour|dt_min|dt_week_no| dt_int|
+-------+--------+------+---------+-------+------+----------+----------+
| 2015| 9| 6| 249| 0| 0| 36|1441497601|
| 2015| 9| 6| 249| 0| 0| 36|1441497601|
| 2015| 9| 6| 249| 0| 0| 36|1441497603|
| 2015| 9| 6| 249| 0| 1| 36|1441497694|
| 2015| 9| 6| 249| 0| 20| 36|1441498808|
| 2015| 9| 6| 249| 0| 20| 36|1441498811|
| 2015| 9| 6| 249| 0| 20| 36|1441498815|