Actually, we really do not need to import any python library. We can separate the year, month, date using simple SQL. See the below example,
+----------+
| _c0|
+----------+
|1872-11-30|
|1873-03-08|
|1874-03-07|
|1875-03-06|
|1876-03-04|
|1876-03-25|
|1877-03-03|
|1877-03-05|
|1878-03-02|
|1878-03-23|
|1879-01-18|
I have a date column in my data frame which contains the date, month and year and assume I want to extract only the year from the column.
df.createOrReplaceTempView("res")
sqlDF = spark.sql("SELECT EXTRACT(year from `_c0`) FROM res ")
Here I'm creating a temporary view and store the year values using this single line and the output will be,
+-----------------------+
|year(CAST(_c0 AS DATE))|
+-----------------------+
| 1872|
| 1873|
| 1874|
| 1875|
| 1876|
| 1876|
| 1877|
| 1877|
| 1878|
| 1878|
| 1879|
| 1879|
| 1879|