Renaming column names of a DataFrame in Spark Scala

Question

I am trying to  convert all the headers   column names of a DataFrame in Spark-Scala  as of now I come up with following code which only replaces a single column name    for  i  lt - 0 to origCols length - 1      df withColumnRenamed      df columns i        df columns i  toLowerCase

User · Answer

tow table join not rename the joined key

// method 1: create a new DF
day1 = day1.toDF(day1.columns.map(x => if (x.equals(key)) x else s"${x}_d1"): _*)

// method 2: use withColumnRenamed
for ((x, y) <- day1.columns.filter(!_.equals(key)).map(x => (x, s"${x}_d1"))) {
    day1 = day1.withColumnRenamed(x, y)
}

works!

User · Answer

If structure is flat   val df   Seq  1L   a    foo   3 0   toDF df printSchema    root      --  1  long  nullable   false       --  2  string  nullable   true       --  3  string  nullable   true       --  4  double  nullable   false    the simplest thing you can do is to use toDF method   val newNames   Seq  id    x1    x2    x3   val dfRenamed   df toDF newNames       dfRenamed printSchema    root     -- id  long  nullable   false      -- x1  string  nullable   true      -- x2  string  nullable   true      -- x3  double  nullable   false    If you want to rename individual columns you can use either select with alias   df select    1  alias  x1      which can be easily generalized to multiple columns   val lookup   Map   1  - gt   foo     3  - gt   bar    df select df columns map c   gt  col c  as lookup getOrElse c  c           or withColumnRenamed   df withColumnRenamed   1    x1     which use with foldLeft to rename multiple columns   lookup foldLeft df   acc  ca    gt  acc withColumnRenamed ca  1  ca  2     With nested structures  structs  one possible option is renaming by selecting a whole structure   val nested   spark read json sc parallelize Seq           foobar     foo     bar     first   1 0   second   2 0      id   1          nested printSchema    root      -- foobar  struct  nullable   true            -- foo  struct  nullable   true                 -- bar  struct  nullable   true                      -- first  double  nullable   true                      -- second  double  nullable   true       -- id  long  nullable   true    transient val foobarRenamed   struct    struct      struct          foobar foo bar first  as  x      foobar foo bar first  as  y         alias  point       alias  location     alias  record    nested select foobarRenamed    id   printSchema    root      -- record  struct  nullable   false            -- location  struct  nullable   false                 -- point  struct  nullable   false                      -- x  double  nullable   true                      -- y  double  nullable   true       -- id  long  nullable   true    Note that it may affect nullability metadata  Another possibility is to rename by casting   nested select   foobar  cast     struct lt location struct lt point struct lt x double y double gt  gt  gt     alias  record    printSchema     root      -- record  struct  nullable   true            -- location  struct  nullable   true                 -- point  struct  nullable   true                      -- x  double  nullable   true                      -- y  double  nullable   true    or   import org apache spark sql types    nested select   foobar  cast    StructType Seq      StructField  location   StructType Seq        StructField  point   StructType Seq          StructField  x   DoubleType   StructField  y   DoubleType            alias  record    printSchema     root      -- record  struct  nullable   true            -- location  struct  nullable   true                 -- point  struct  nullable   true                      -- x  double  nullable   true                      -- y  double  nullable   true

User · Answer

Suppose the dataframe df has 3 columns id1  name1  price1 and you wish to rename them to id2  name2  price2  val list   List  id2    name2    price2   import spark implicits   val df2   df toDF list     df2 columns foreach println    I found this approach useful in many cases

User · Answer

For those of you interested in PySpark version  actually it s same in Scala - see comment below         merchants df renamed   merchants df toDF           merchant id    category    subcategory    merchant        merchants df renamed printSchema     Result      root     -- merchant id  integer  nullable   true      -- category  string  nullable   true      -- subcategory  string  nullable   true      -- merchant  string  nullable   true

User · Answer

def aliasAllColumns t  DataFrame  p  String       s  String        DataFrame       t select  t columns map   c   gt  t col c  as  p   c   s               In case is isn t obvious  this adds a prefix and a suffix to each of the current column names  This can be useful when you have two tables with one or more columns having the same name  and you wish to join them but still be able to disambiguate the columns in the resultant table  It sure would be nice if there were a similar way to do this in  normal  SQL

User · Answer

Sometime we have the column name is below format in SQLServer or MySQL table  Ex    Account Number customer number  But Hive tables do not support column name containing spaces  so please use below solution to rename your old column names   Solution   val renamedColumns   df columns map c   gt  df c  as c replaceAll  quot   quot    quot   quot   toLowerCase     df   df select renamedColumns

[scala] Renaming column names of a DataFrame in Spark Scala

Examples related to scala

Examples related to apache-spark

Examples related to dataframe

Examples related to apache-spark-sql