How to change a dataframe column from String type to Double type in PySpark

Question

I have a dataframe with column as String  I wanted to change the column type to Double type in PySpark   Following is the way  I did   toDoublefunc   UserDefinedFunction lambda x  x DoubleType    changedTypedf   joindf withColumn  label  toDoublefunc joindf  show       Just wanted to know  is this the right way to do it as while running through Logistic Regression  I am getting some error  so I wonder  is this the reason for the trouble

User · Answer

the solution was simple -  toDoublefunc   UserDefinedFunction lambda x  float x  DoubleType    changedTypedf   joindf withColumn  label  toDoublefunc joindf  show

User · Answer

Given answers are enough to deal with the problem but I want to share another way which may be introduced the new version of Spark  I am not sure about it  so given answer didn t catch it   We can reach the column in spark statement with col  colum name   keyword   from pyspark sql functions import col   column changedTypedf   joindf withColumn  show   col  show   cast  double

User · Answer

There is no need for an UDF here  Column already provides cast method with DataType instance    from pyspark sql types import DoubleType  changedTypedf   joindf withColumn  label   joindf  show   cast DoubleType       or short string   changedTypedf   joindf withColumn  label   joindf  show   cast  double      where canonical string names  other variations can be supported as well  correspond to simpleString value  So for atomic types   from pyspark sql import types   for t in   BinaryType    BooleanType    ByteType    DateType               DecimalType    DoubleType    FloatType    IntegerType                LongType    ShortType    StringType    TimestampType        print f  t    getattr types  t    simpleString        BinaryType  binary BooleanType  boolean ByteType  tinyint DateType  date DecimalType  decimal 10 0  DoubleType  double FloatType  float IntegerType  int LongType  bigint ShortType  smallint StringType  string TimestampType  timestamp   and for example complex types  types ArrayType types IntegerType    simpleString         array lt int gt     types MapType types StringType    types IntegerType    simpleString      map lt string int gt

User · Answer

Preserve the name of the column and avoid extra column addition by using the same name as input column   changedTypedf   joindf withColumn  show   joindf  show   cast DoubleType

User · Answer

pyspark version     df    lt source data gt    df printSchema      from pyspark sql types import        Change column type   df new   df withColumn  myColumn   df  myColumn   cast IntegerType       df new printSchema     df new select  myColumn   show

[python] How to change a dataframe column from String type to Double type in PySpark?

Examples related to python

Examples related to apache-spark

Examples related to dataframe

Examples related to pyspark

Examples related to apache-spark-sql