How to export a table dataframe in PySpark to csv

Question

I am using Spark 1 3 1  PySpark  and I have generated a table using a SQL query  I now have an object that is a DataFrame  I want to export this DataFrame object  I have called it  table   to a csv file so I can manipulate it and plot the columns  How do I export the DataFrame  table  to a csv file   Thanks

User · Answer

If you cannot use spark-csv  you can do the following   df rdd map lambda x      join map str  x    coalesce 1  saveAsTextFile  file csv     If you need to handle strings with linebreaks or comma that will not work  Use this   import csv import cStringIO  def row2csv row       buffer   cStringIO StringIO       writer   csv writer buffer      writer writerow  str s  encode  utf-8   for s in row       buffer seek 0      return buffer read   strip    df rdd map row2csv  coalesce 1  saveAsTextFile  file csv

User · Answer

You need to repartition the Dataframe in a single partition and then define the format  path and other parameter to the file in Unix file system format and here you go   df repartition 1  write format  com databricks spark csv   save   path to file myfile csv  header    true     Read more about the repartition function Read more about the save function  However  repartition is a costly function and toPandas   is worst  Try using  coalesce 1  instead of  repartition 1  in previous syntax for better performance   Read more on repartition vs coalesce functions

User · Answer

If data frame fits in a driver memory and you want to save to local files system you can convert Spark DataFrame to local Pandas DataFrame using toPandas method and then simply use to csv   df toPandas   to csv  mycsv csv     Otherwise you can use spark-csv    Spark 1 3  df save  mycsv csv    com databricks spark csv    Spark 1 4   df write format  com databricks spark csv   save  mycsv csv      In Spark 2 0  you can use csv data source directly   df write csv  mycsv csv

User · Answer

How about this  in you don t want an one liner      for row in df collect        d   row asDict       s     d t s t s n     d  int column    d  string column    d  string column        f write s    f is a opened file descriptor  Also the separator is a TAB char  but it s easy to change to whatever you want

User · Answer

For Apache Spark 2   in order to save dataframe into single csv file  Use following command  query repartition 1  write csv  cc out csv   sep        Here 1 indicate that I need one partition of csv only  you can change it according to your requirements

[python] How to export a table dataframe in PySpark to csv?

Examples related to python

Examples related to apache-spark

Examples related to dataframe

Examples related to apache-spark-sql

Examples related to export-to-csv