Convert pyspark string to date format

Question

I have a date pyspark dataframe with a string column in the format of MM-dd-yyyy and I am attempting to convert this into a date column     I tried    df select to date df STRING COLUMN  alias  new date    show    and I get a string of nulls   Can anyone help

User · Answer

possibly not so many answers so thinking to share my code which can help someone

from pyspark.sql import SparkSession
from pyspark.sql.functions import to_date

spark = SparkSession.builder.appName("Python Spark SQL basic example")\
    .config("spark.some.config.option", "some-value").getOrCreate()


df = spark.createDataFrame([('2019-06-22',)], ['t'])
df1 = df.select(to_date(df.t, 'yyyy-MM-dd').alias('dt'))
print df1
print df1.show()

output

DataFrame[dt: date]
+----------+
|        dt|
+----------+
|2019-06-22|
+----------+

the above code to convert to date if you want to convert datetime then use to_timestamp. let me know if you have any doubt.

User · Answer

Try this   df   spark createDataFrame    2018-07-27 10 30 00        Date col    df select from unixtime unix timestamp df Date col   yyyy-MM-dd HH mm ss    alias  dt col    df show    -------------------                Date col     -------------------     2018-07-27 10 30 00     -------------------

User · Answer

from datetime import datetime from pyspark sql functions import col  udf from pyspark sql types import DateType      Creation of a dummy dataframe  df1   sqlContext createDataFrame    11 25 1991   11 24 1991   11 30 1991                                   11 25 1391   11 24 1992   11 30 1992     schema   first    second    third       Setting an user define function    This function converts the string cell into a date  func    udf  lambda x  datetime strptime x    m  d  Y    DateType     df   df1 withColumn  test   func col  first      df show    df printSchema     Here is the output    ---------- ---------- ---------- ----------        first     second      third       test   ---------- ---------- ---------- ----------   11 25 1991 11 24 1991 11 30 1991 1991-01-25   11 25 1391 11 24 1992 11 30 1992 1391-01-17   ---------- ---------- ---------- ----------   root   -- first  string  nullable   true    -- second  string  nullable   true    -- third  string  nullable   true    -- test  date  nullable   true

User · Answer

Update  1 10 2018   For Spark 2 2  the best way to do this is probably using the to date or to timestamp functions  which both support the format argument  From the docs   gt  gt  gt  from pyspark sql functions import to timestamp  gt  gt  gt  df   spark createDataFrame    1997-02-28 10 30 00        t     gt  gt  gt  df select to timestamp df t   yyyy-MM-dd HH mm ss   alias  dt    collect    Row dt datetime datetime 1997  2  28  10  30     Original Answer  for Spark  lt  2 2  It is possible  preferrable   to do this without a udf  from pyspark sql functions import unix timestamp  from unixtime  df   spark createDataFrame         quot 11 25 1991 quot       quot 11 24 1991 quot       quot 11 30 1991 quot             date str      df2   df select       date str        from unixtime unix timestamp  date str    MM dd yyy    alias  date      print df2   DataFrame date str  string  date  timestamp   df2 show truncate False    ---------- -------------------    date str   date                   ---------- -------------------    11 25 1991 1991-11-25 00 00 00    11 24 1991 1991-11-24 00 00 00    11 30 1991 1991-11-30 00 00 00    ---------- -------------------

User · Answer

The strptime   approach does not work for me  I get another cleaner solution  using cast   from pyspark sql types import DateType spark df1   spark df withColumn  record date  spark df  order submitted date   cast DateType      below is the result spark df1 select  order submitted date   record date   show 10 False    --------------------- -----------   order submitted date  record date   --------------------- -----------   2015-08-19 12 54 16 0 2015-08-19    2016-04-14 13 55 50 0 2016-04-14    2013-10-11 18 23 36 0 2013-10-11    2015-08-19 20 18 55 0 2015-08-19    2015-08-20 12 07 40 0 2015-08-20    2013-10-11 21 24 12 0 2013-10-11    2013-10-11 23 29 28 0 2013-10-11    2015-08-20 16 59 35 0 2015-08-20    2015-08-20 17 32 03 0 2015-08-20    2016-04-13 16 56 21 0 2016-04-13

User · Answer

In the accepted answer s update you don t see the example for the to date function  so another solution using it would be   from pyspark sql import functions as F  df   df withColumn               new date                   F to date                      F unix timestamp  STRINGCOLUMN    MM-dd-yyyy   cast  timestamp

[apache-spark] Convert pyspark string to date format

Examples related to apache-spark

Examples related to pyspark

Examples related to apache-spark-sql

Examples related to pyspark-sql