How can I change column types in Spark SQL's DataFrame?

The Solution to How can I change column types in Spark SQL's DataFrame? is


Edit: Newest version

Since spark 2.x you can use .withColumn. Check the docs here:

https://spark.apache.org/docs/latest/api/scala/index.html#[email protected](colName:String,col:org.apache.spark.sql.Column):org.apache.spark.sql.DataFrame

Oldest answer

Since Spark version 1.4 you can apply the cast method with DataType on the column:

import org.apache.spark.sql.types.IntegerType
val df2 = df.withColumn("yearTmp", df.year.cast(IntegerType))
    .drop("year")
    .withColumnRenamed("yearTmp", "year")

If you are using sql expressions you can also do:

val df2 = df.selectExpr("cast(year as int) year", 
                        "make", 
                        "model", 
                        "comment", 
                        "blank")

For more info check the docs: http://spark.apache.org/docs/1.6.0/api/scala/#org.apache.spark.sql.DataFrame

~ Answered on 2015-10-29 20:27:50


Most Viewed Questions: